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CONSER VED AND SPECIFIC STREPTOC OCCAL GENOMES^ 

5 CROSS REFERENCE TO RELATED APPLICATIONS 

This application claims priority of U.S. provisional patent application Serial No. 
60/406,237, filed August 26, 2002, U.S. provisional patent application Serial No. 60/406,676, 
filed August 27, 2002 and U.S. provisional patent application Serial No. 60/406,757, filed 
August 28, 2002. 

10 FIELD OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. The 
conserved or specific genomic regions can be used to identify, screen and develop vaccines and 
other treatments for Streptococcal infections and can be used in diagnostic assays to diagnose 

15 and identify Streptococcal infections. 

BACKGROUND OF THE INVENTION 

The genus Streptococcus consists of Gram-positive, chain-forming, spherical bacterial 

cells. Three species of clinical interest are S.pneumoniae Cpneumococcus" or "S.pn."). 
20 S.pyogenes ('group A streptococcus' or *GAS') and S.agalactiae Cgroiq) B streptococcus' or 

*GBS'). Infections with these three pathogenic streptococci lead to conditions mcluding 

pharyngitis, toxic shock syndrome and necrotizing fasciitis. 

Once thought to infect only cows, GBS is now known to cause serious disease, 

bacteraemia and meningitis in immunocompromised individuals and neonates. There are two 
25 known types of neonatal infection. The first (early onset, usually within 5 days of birth) is 

manifested by bacteraemia and infection. It is generally contracted vertically as a baby passes 

trough the birth canal. GBS is thought to colonize the v^ina of about 25% of young women; 

approximately 1% of infants bom via a vaginal birth to colonised mothers will become infected. 

Mortality resulting from these infections is between 50 - 70%. The second type of neonatal 
30 Lnfection is a meningitis that occurs 10 to 60 days after birth. If pregnant women are vaccinated 

with type HI capsule so that the infants are passively immunised, the incidence of the late onset 

meningitis is generally reduced, although not entirely eliminated. 
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The '*B" in "GBS" refers to the LancefLeld classification, which is based on the 
antigenicity of a carbohydrate which is soluble in dilute acid and called the C carbohydrate. 
Lancefield identified 13 types of C carbohydrate, designated A to O, that could be serologically 
differentiated. The organisms that most commonly infect hnmans are found in groiips A, B, D, 

5 and G. Within group B, strains can be divided into at least 9 serotypes (la, lb, II, HI, IV, V, VI, 
Vn, and Vm) based on the structure of their polysaccharide capsule. Further categories based 
on, for example, the expression of certain proteins have also been developed. 

GBS strains of polysaccharide capsule Type V were rarely isolated before the mid-1980's 
but now account for approximately one-third of clinical isolates in the US. Type V is the most 

10 common capsular serotype associated with invasive infection in nonpregnant adults, and the 
emergence of Type V strain over the past decade has been temporarily linked to an inorease in 
GBS disease in this population. 

Group A streptococcus is a frequent human pathogen, estimated to be present in between 
5-15% of normal individuals without signs of disease. When host defences are compronadsed, 

15 or when the organism is able to exert its virulence, or when it is introduced into vulnerable 
tissues or hosts, however, an acute infection occurs. Diseases include puerperal fever, scarlet 
fever, erysipelas, pharyngitis, impetigo, necrotising fasciitis, myositis and streptococcal toxic 
shock syndrome. 

Pneumococcus is the most conomon cause of acute respiratory infection and otitis media 
20 and is estimated to result in over 3 million deaths in children every year worldwide from 

pneumonia, bacteremia, or meningitis. Even more deaths occur among elderly people, among 
whom S. pn. is the leading cause of community-acquired pneumonia and meningitis. Since 
1990, the number of penicillin-resistant strains has increased from 1 to 5% to 25 to 80% of 
isolates, and many strains are now resistant to commonly prescribed antibiotics such as 
25 penicillin, macrolides, and iauoroquinolones. See Tettelin, et al. (2001) Science 293, 248-506. 

The complete genomic sequence of a virulent isolate of S. pneumoniae was published by 
Tettelin, et al. (2001) Science 293, 248-506 and is available at the TIGR website at 
http://www.tigr.org . as well as on GEN BANK (available through the Pub Med website at 
http://www -nchi n1m.m h.gov/entrez/querv.fcgiy The genomic sequence, the Tettelin article and 
30 its published supplemental material are incorporated herein by reference in their entirety. 

The complete genomic sequence of an Ml strain of 5. pyrogenes was published by 
Fexretti, et al. (2001) Proc. Natl Acad. Set USA 98, 4658 - 4663 and is available at Ihe TIGR 
website at http://www.tigr.org. The genomic sequence, the Ferretti article and its published 
supplemental materials are incorporated herein by reference in their entirety. 
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The complete genomic sequence of a serotype V strain of S. agalactiae (type V strain 
2603 V/R) was published on August 28, 2002 at Gen Bank Accession no. AE009948 (available 
through Pub Med at http://www.ncbi.nlro -mh.p ;nv/entrez/querv.fcgi and/or was available on flie 
same day at the TIGR website at http://www.tigr.org. Most of this sequence is also availabe in 
5 PCT Memational Patent Application Publication WO 02/34771 . The genomic sequence, flie 
Tettelin article and its published supplemental materials are incorporated herein by reference in 
their entirety. 

Current treatments for Streptococcal infections include both antibiotics and prophylactic 
vaccination. Current vaccines, particularly with respect to GBS, suffer from poor 

10 immunogenicity, while the emergence of antibiotic resistant strains has lessened the 

effectiveness of currently used antibiotics. Accordingly, there is an increasing need for the 
development of new vaccines and antibiotics (as well as other small molecule bacterial 
inhibitors) to help prevent and treat Streptococcal infections. 

Applicants have identified regions of the Streptococcal genomes which can be used to 

15 identify and develop new vaccines and treatments for Streptococcal iirfections. Specifically, 
Applicants have identified polynucleotides of the Streptococcal genome which are conserved or 
specific to Streptococcal species, species serotypes, and/or specific serotype isolates. These 
polynucleotides and their expressed polypeptides can be used to screen, develop and design new 
vaccines, antibiotics and other small molecule bacterial inhibitors. These polynucleotides and 

20 their expressed polypeptides can fijrfher be used to diagnose and identify Steptococcal infections. 

SUMMARY OF THE BSfVENTION 
The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. In particular, 
25 the invention relates to polynucleotides from Streptococcus which are conserved or specific to 
one or more of the species oiS, pneumoniae ("pneumococcus'' or "S. pn."), S. pyogenes ("group 
A streptococcus" or "GAS"), and S. agalactiae ("group B streptococcus" or "GBS"). The 
invention fiirther relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, II, HI, IV, V, VI, VH, and Vm. 
30 The invention still fiirther relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

The invention is based on the identification of the following Subsets of gmes. Genes 
falling within each subset are desaibed with respect to referenced tables, lists, and/or figures (in 
particular the CGH map depicted in Figure 1). 
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The following Subsets relate to the GBS genome: 

GBS Subset 1: 1060 GBS genes which have homologs with GAS and with 
pneumococcus (Table 8); 

GBS Subset 2: 225 GBS genes which have homologues with GAS, but not with 
5 pneumococcus (Table 10); 

GBS Subset 3: 176 GBS genes which have homologues with pneumococcus but not 
with GAS (Table 9); 

GBS Subset 4: 683 GBS genes which do not have homologues with GAS or 
pneimiococcus (specific to GBS vs GAS and pneumococcus) (Table 11). 
10 The invention is based on the identification of the following subsets of genes within the 

GAS genome: 

GAS Subset 1: 1006 GAS genes which have homologues with GBS and with 
pneiimococcus (Table 33); 

GAS Subset 2: 212 GAS genes which have homologues with GBS but do not have 
1 5 homologues with pneumococcus (Table 34); 

GAS Subset 3: 62 GAS genes which have homologues with pneumococcus but do not 
have homologues with GBS (Table 35); 

GAS Subset 4: 416 GAS genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be detemiined by subtracting the above subsets firom the 
20 published genome. 

The invention is based on the identification of the following subsets of genes within the 
pneumococcus genome: 

Spn Subset 1: 1034 Spn genes which have homologues with GBS and GAS (Table 36); 

Spn Subset 2: 195 Spn genes which have homologues with GBS but do not have 
25 homologues with GAS (Table 37); 

Spn Subset 3: 74 Spn genes which have homologues with GAS but do not have 
homologues with GBS (Table 38); 

Spn Subset 4: 836 Spn genes which do not have homologues with either GBS or 
pneumococcus. This Subset can be determined by substracting the above Subsets firom the 
30 published genome. 

The invention finther provides polynucleotides which are conserved or specific to 
Streptococcus based on a comparison with a wide range of published bacterial genomes. The 
following additional Subsets are provided: 
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GBS Subset 1(a): Of the 1060 GBS genes which have homologues in both GAS and 
pneumococcus, 12 of those GBS genes do not have homologues with any of the other published 
bacterial genomes at the time of the invention (i.e., GBS Subset 1(a) is specific to Streptococcus 
vs non Streptococcus published genomes). (The 12 GBS ORF's are listed in Table 3). 
5 GBS Subset 2(a): This Subset comprises GBS genes which have homologues with 

GAS, but not with pneumococcus or any other published bacterial genomes at the time of the 
invention. 

GBS Subset 3(a): This Subset comprises GBS genes which have homologues with 
pneumococciis, but not with GAS or any other published bacterial genomes at the time of the 
10 invention. 

GBS Subset 4(a): Of the 683 GBS genes which do not have homologues in either GAS 
or pnuemococcus, 315 of these GBS genes also do not have homologues with any of the other 
pubUshed bacterial genomes. These include six proteins predicted to be anchored on the cell 
wall (SAG0677, SAG0771, SAG1052, SAG1331, SAG1473, and SAG1168), three of the 
15 capsule-related genes (SAGl 163, SAGl 167, and SAGl 168), six transcriptional regulators, and 
four genes of the cyl operon (SAG0663 - SAG0673) essential for GBS hemolytic activity and 
production of pigment See Pritzlaff et al. (2001) Mol Microbiol, 39, 236 - 247. The rest of the 
315 proteins include 240 hypothetical proteins with no sinoilarity to other protems in databases. 
Many of flie 3 1 5 genes specific to S. agalactiae are located in regions likely to constitute 
20 mobile g«ietic elements. Two of these regions resemble prophages (SAG0545-SAG0610 and 
SAGl 835-SAGl 885) displaying a mosaic structure with segments most similar to different 
bacteriophages, a pattem that suggests frequent recombination events. PblA and PblB are 
adhesins from a S, mitts prophage where they contribute to endocarditis by binding to human 
platelets (See Sensing, et al. (2001) Infect Immun, 69, 6186 - 6192; Sensing, et al (2001) Infect. 
25 Immun. 69, 1373 - 1380. Their orthologs in S. agalactiae are located on separate prophages and 
display a different protein structure. Another region (SAGl 247-SAGl 299) encodes a putative 
conjugative transposon that carries genes for cadmium efflux and mercury resistance. 

GAS Subset 1(a): This Subset comprises GAS genes which have homologues with GBS 
and with pneumococcus, but do not have homologues with any of the other published bacterial 
30 genomes at the time of the invention. 

GAS Subset 2(a): This Subset comprises GAS genes which have homologues with GBS 
but do not have homologues with pneumococcus or any of the other published bacterial goiomes 
at the time of the inv^tion; 
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GAS Subset 3(a): This Subset comprises GAS genes which have homologues with 
pneumococcus but do not have homologues with GBS or any of the other published bacterial 
genomes at the time of the invention. 

GAS Subset 4(a): This Subset comprises GAS genes which do not have homologues 
5 with either GBS or pneumococcus or with any of the otiier published bacterial genomes at the 
time of the invention. 

Spn Subset 1(a): This Subset comprises Spn genes which have homologues with GBS 
and GAS but which do not have homologues with any of the other published bacterial genomes 
at the time of the invention; 
10 Spn Subset 2(a): This Subset comprises Spn genes which have homologues with GBS 

but do not have homologues with GAS or with any of the other published bacterial genomes at 
the time of the invention; 

Spn Subset 3(a): This Subset comprises Spn genes which have homologues with GAS 
but do not have homologues with GBS or with any of the other published bacterial genomes at 
15 the time of the invention; 

Spn Subset 4(a): This Subset comprises Spn g^es which do not have homologues with 
either GBS or pneumococcus or with any of the other published bacterial genomes at the time of 
the invention. 

The invention also provides polynucleotides which are conserved or specific to GBS 
20 serotypes and/or clinical isolates. Applicants have sequenced 19 GBS genes from a variety of 

GBS serotypes in 1 1 different clinical isolates. The sequences of these genes and their 

aligmnents are set forth in Tables 13-31. Polynucleotide and polypeptide sequences which are 

specific or conserved across one or more clinical isolates can be identified using tiiese 

aligmnents. The following additional subsets are provided: 
25 GBS Subset 1(b): of the 1060 GBS genes which have homologues with GAS and with 

pneumococcus, 47 of these GBS genes vary among the 11 clinical isolates (GBS Subset l(b)(i)). 

1013 of these GBS genes are conserved across the 1 1 clinical isolates (GBS Subset l(b)(ii)). 

These Hsts can be determined by comparing the genes listed in Table 8 with the Comparative 

Genome Hybridization in Figure 1, 
30 GBS Subset 2(b): of the 225 GBS genes which have homologues witii GAS, but not 

pneumococcus, 44 of these GBS genes vary among the 1 1 clinical isolates (GBS Subset 2(b)(i)). 

181 of tiiese GBS genes are conserved across the 11 clinical isolates (GBS Subset 2(b)(ii)). 

These lists can be determined by comparing the genes listed in Table 10 with the Comparative 

Genome Hybridization in Figure 1. 
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GBS Subset 3(b): of the 176 GBS genes which have homologues with pneumococcus, 
44 of these GBS genes vary among 1 1 clinical isolates (GBS Subset 3(b)(i)). 132 of these GrBS 
genes are conserved across the 1 1 clinical isolates (GBS Subset 3(b)(ii)). This list can he 
determined by comparing the genes listed in Table 9 with the Comparative Genome 
5 Hybridization in Figure 1 , 

GBS Subset 4(b): of the 683 GBS genes which do not have homologues with GAS or 
pneumococcus, 260 GBS genes vary among the 1 1 clinical isolates (GBS Subset 4(b)(i)). 423 
of these GBS genes are conserved across the 11 clinical isolates (GBS Subset 4(b)(ii)). This list 
can be determined by comparing the genes listed in Table 1 1 with the Comparative Genome 
10 Hybridization in Figure 1 . GBS Subset 4(b)(ii) also includes the GBS ORF's listed on Table 12 
receiving a imder the column "GBS specific". 

An additional 63 GBS genes have been sequenced and compared in 2 - 1 1 clinical 
isolates. These sequences and their aligmnents are provided in Tables 40 - 89. Polynucleotide 
and polypeptide sequences which are specific or conserved ao^oss one or more clinical isolates 
IS can be identified using these alignments. 

The invention fiirfher provides polynucleotides which are likely recent genomic 
duplications in GBS. These duplications include glycosyl transferases, sortases, proteins 
anchored on the cell wall, B lactam resistance factors, and many hypothetic proteins. The GBS 
genes are listed in Table 4 (GBS Subset 5). 
20 The invention is also based on the identification of a cluster of 13 adjacent genes 

(SAG1410 - SAG1424) which is believed to encode enzymes required for synthesis of the group 
B carbohydrate, a coplex multiantennary structure of rhamnose, glucitol phosphate, N- 
acetylglucosamine, and galactose. (GBS Subset 6). Predicted proteins encoded within this 
cluster include seven putative glycoslytransferases, four of which are similar to 
25 rhamnosyltransferases in other streptococcal species; a putative dTDP-L-rhamnose synthase; and 
proteins involved in glucitol synthesis. All nine regonized GBS capsular polysaccharide types 
contain sialic add residues as part of their repeating unit structure, a feature that contributes to 
virulence by inhibitng activation of the alternative complement pathway. See Edwards et al. 
(1982) J. Immunol 128, 1278 - 1283. 
30 The type V capsular polysaccharide gene cluster consists of 1 8 genes. (GBS Subset 

6(a)). A region of glycosyltransferases and related proteins (SAGl 162 - SAGl 170) that direct 
the synthesis of the type V polysaccharide rqpeat unit is flanked on either side by genes that are 
conserved in all known GBS capsule serotypes. Downstream of this region are genes that 
encode enzynmes for the biosynthesis and activation of sialic acid (SAGl 1 58 — SAGl 161). 
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Upstream of the serotype specific region are genes (SAGl 171 - SAGl 175) foimd not only in all 
nine GBS capsular serotypes but also in a variety of other polysaccharide-produdbag 
streptococci. 

The invention is also based on the identification of GBS ORFs predicted to encode 
proteins carrying a signal peptide (GBS Subset 7). These GBS ORF's are listed in Table 2 
receiving a xinder the column "signal peptide". 

The invention is also based on the identification of GBS ORFs predicted to encode 
proteins which are anchored on the cell wall through an LPxTG motif (GBS Subset 8). These 
GBS ORF's are listed in Table 2 receiving a under the column "sortase motif. 

The invention is also based on the identification of GBS ORFs prediced to encode ^ 
lipoproteins (GBS Subset 9). These GBS ORF's are listed in Table 2 receiving a under the 
column "lipoprotein". 

The invention is also based on the identification of two GBS ORF's predicted to encode 
enzymes related to metabolism (GBS Subset 10). These GBS ORFs include a putative 
pulManase (SAG1216) and a neuraxninidase-related protein (SAGl 932). 

The invention is also based on the identification of GBS ORF's predicted to encode 
proteins exposed on the cell surface (GBS Subset 11). These GBS ORF's are listed in Table 2 
receiving a "+" under the column *TACS". 

The invention is also based on the identification of 401 GBS ORF's from GBS strain 
2603 V/R which were not detected in at least one other of the 1 1 tested clinical isolates (GBS 
Subset 12). See Comparative Hybridization Genome in Figure 1. 364 of these 401 ORF's 
correspond to 15 regions containing more than 5 contiguous genes. Each region is identified in 
Figure 1 by numerical yellow bullets. Each region comprises a subset as defined below: 

Region 1: GBS Subset 12(a). This region is unique to GBS (SAG021 8 - SAG0238)^ 
This region is a possible plaismid or remnant of a phage and contains mostly hypothetical 
proteins. 

Region 2: GBS Subset 12(b) 

Regions: GBS Subset 12(c) 

Region 4: GBS Subset 12(d) 

Regions: GBS Subset 12(e) 

Region 6: GBS Subset 12(f) 

Region?: GBS Subset 12(g) 
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Region 8: GBS Subset 12(h). This region is specific to GBS (SAG1018 - SAG1037). 
This regioncompiises 20 proteins of iinknown function, most of which are predicted to be 
membrane associated or secreted, and displays an atypical nucleotide composition. 

Region 9: GBS Subset 12(Q 
S Region 10: GBS Subset 12Q) 

Region 11: GBS Subset 12Ck) 

Region 12: GBS Subset 12(1) 

Region 13: GBS Subset 12(m) 

Region 14: GBS Subset 12(n). This region is unique to GBS and spans 33 genes 
10 (SAG1989 - 2021), including 25 proteins of unknown function, some of which carry a cell-wall 
anchor. 

Region IS: GBS Subset 12(o). 

This invention is also based on identification of clusters of GBS genes as set forth in 
Figure 5 and Table 6. In Figure S, the presence of a particular gene or gene cluster is indicated in 
IS the figure by a red square and ibs ahsenxx of a gene or cluster by a black square. The 

relationship between strains based on this analysis is depicted by the tree at the top of the figure. 
The strains and theur serotypes are indicated (NT: nontypeable). Clusters with identical profiles 
are reduced to a single horizontal line and the number of genes in each cluster is indicated on the 
light. The clusters of 5 or more ^nes, labeled in red text and numbered, are listed in Table 6. 
20 The 1698 genes shared by all 1 9 strains are labeled in green text. Applicants identified the 
following subsets: 

GBS Subset 13 (a): Cluster 1 (firom Table 6). 

GBS Subset 13 (b): Cluster 2 (from Table 6). 

GBS Subset 13 (c): Cluster 3 (from Table 6). 
25 GBS Subset 13 (d): Cluster 4 (from Table 6). 

GBS Subset 13 (e): Cluster 5 (from Table 6). 

GBS Subset 13 (f): Cluster 6 (from Table 6). 

GJBS Subset 13 fe): Cluster 7 (fixMn Table 6). 

GBS Subset 13 (h): Cluster 8 (fix>m Table 6). 
30 GBS Subset 13 (Q: Cluster 9 (from Table 6). 

GBS Subset 13 (j): Cluster 10 (fiom Table 6). 

GBS Subset 13 (k): Cluster 1 1 (firom Table 6). 

GBS Subset 13 (1): Cluster 12 (from Table 6). 

GBS Subset 13 (m): Cluster 13 (fix>m Table 6). 
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GBS Subset 13 (n): Cluster 14 (fiom Table 6). 
GBS Subset 13 (o): Cluster IS {&om Table 6). 
GBS Subset 13 (p): Cluster 16 (from Table 6). 
GBS Subset 13 (q): 1698 ORFs shared by all strains. 

S The inveation is also based on the identification of the polynucleotide sequences of 82 

genes from up to 1 1 dififerent GBS strains. 19 of these genes are listed on Table 7. A further 
GBS Subset 14 includes this set of polynucleotide sequences from the 1 1 strains and Iheir 
encoded polypeptide sequences. In particular, GBS Subset 14 contains a Subset of 
polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 

10 between two or more strains (GBS Subset 14(a)). GBS Subset 14 ftirther includes a Subset of 
polynucleotide fragments of 15 or more contiguous polynucleotides which are conserved 
between two or more strains (GBS Subset 14(b)). GBS Subset 14 ftirther includes a Subset of 
polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
between ttiree or more strains (GBS Subset 14(c)). GBS Subset 14 ftirther includes a Subset of 

15 polynucleotide fragments of 10 or more contiguous polynucleotides which are conserved 
between four or more strains (GBS Subset 14(d)). 

GBS Subset 14 ftirther includes a Subset of polypeptide fragments of 5 or more 
contiguous amino acids which are conserved between in two or more strains (GBS Subset 
14(e)). GBS Subset 14 fturdier includes a Subset of polypeptide fragments of 5 or more 

20 contigous amino acids which are conserved between three or more strains (GBS Subset 14(f)). 
GBS Subset 14 ftirther includes a Subset of polypeptide fragments of 5 or more contiguous 
amino acids which are conserved between four or more strains (GBS Subset 14(g)). GBS 
Subset 14 ftirther includes a Subset of polypeptide fragments of 10 or more contiguous amino 
acids which are conserved across two or more strains (GBS Subset 14(h)). 

25 The invention provides for methods of screening a Streptococcal genome for a conserved 

or a specific genomic sequence using one or more of the Subsets of the invention. 

The invention ftirther provides for an immunogenic composition comprising a 
polypeptide expressed by one or more of the polynucleotides in one or more of the Subsets of the 
invention, and methods for designing an immunogenic composition by selecting one or more 

30 polypeptides expressed by one or more of the polynucleotides in one or more of the Subsets of 
the invention. Preferably, the inmnmogenic compositions of the invention comprise at least two, 
three, four or five polypeptides encoded by polynucleotides within the same Subset. 
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The invention furttier provides for methods of screening compounds for activity against a 
Streptococcal bacteria, which method comprises contacting the compoimds with a polypeptide 
expressed by the polynucleotide from one of the Subsets of the invention. 

The invention further provides for compositions comprising one or more of the 
polynucleotides, and fragments thereof, selected from the group consisting of the sequences set 
forth in Tables 13-31 or 40 -89. 

The invention further provides for compositions comprising polypeptides and fragments 
thereof encoded by the polynucleotides set forth in Tables 13 - 31 or 40 -89. 

The invention provides for compositions comprising polypeptides and fragments thereof 
set forth in Tables 13 ~ 31 or 40 -89. 

BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 
Table 1 comprises a complete list of GBS predicted genes, hsted by SAGxxxx ORF 
number. The SAGxxxx ORF number corresponds to the genomic sequence for the 
Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 
28, 2002 at http://www.tigr.org or at the GenBank database at accession number AE009948. 
This table also includes the predicted amino acid size of the predicted expressed protein and the 
predicted frinction, if known. 

Table 2 comprises a list of predicted and experimentally characterized surface and 
secreted proteins from GBS. The SAGxxxx ORF number corresponds to the gnomic sequence 
for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by 
August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 
AE009948. 

Table 3 lists GBS genes which were shared among GBS, GAS and pneumococcus, but 
which were not found in any of the other completely sequenced genomes. The SAGxxxx ORF 
number corresponds to the genomic sequence for flie Streptococcus agalactiae type V strain 2603 
V/R available either at flie TIGR website by August 28, 2002 at http://www.tigr.org or at flie 
GenBank database at accession number AE009948. 

Table 4 depicts GBS genes which are predicted to have been recentiy duplicated within 
the gmome. The SAGxxxx ORF number corresponds to the genomic sequence for the 
Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR website by August 
28, 2002 at http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 5 lists the 19 GBS strains used for comparative genome hybridisations and 
phylogenetic analysis. 
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Table 6 lists clusters of GBS genes derived from phylogenetic profiling of GBS strains 
based on comparative genome hybridisations. The S AGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28 j 2002 at http://www.tigr.org or at the GenBank database at 
5 accession number AE009948. 

Table 7 lists the GBS genes xised for phylogenetic analyses of the 19 GBS strains. The 
SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus agalactiae 
type V strain 2603 V/R available either at the TIGR website by August 28, 2002 
http://www>tigr.org or at the GenBank database at accession number AE009948. 

10 Table 8 lists the 1060 GBS ORF's which are shared with GAS and pneumococcus. The 

ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 
agalactiae type V strain 2603 V/R available either at the TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. ^ 

15 Table 9 lists the 176 GBS ORF's which are shared with pneumococcus but which are not 

homologous to a GAS gene. The ORFxxxxx reference number can be translated to SAGxxxx 
ORF number by using Table 32. The SAGxxxx ORF number corresponds to the genomic 
sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the TIGR 
website by August 28, 2002 at httD://www.tigr.org or at the GenBank database at accession 

20 number AE009948. 

Table 10 lists the 225 GBS ORF's which are shared with GAS but which are not 
homologous with a pnuemococcus gene. The ORFxxxxx reference niraiber can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 

25 TIGR website by August 28, 2002 at http://www.tigr.org or at the GenBank database at 
accession number AE009948. 

Table 11 lists 683 GBS ORF's which are not shared with either GAS or pneumococcus. 
The ORFxxxxx reference number can be translated to SAGxxxx ORF number by using Table 32. 
The SAGxxxx ORF number corresponds to the genomic sequence for the Streptococcus 

30 agalactiae type V strain 2603 V/R available either at ttie TIGR website by August 28, 2002 at 
http://www.tigr.org or at the GenBank database at accession number AE009948. 

Table 12 lists 315 GBS ORF*s which are not shared with GAS, pneinnococcus or any 
other published genomic sequence. The ORFxxxxx reference numbo* can be translated to 
SAGxxxx ORF number by using Table 32. The SAGxxxx ORF number corresponds to the 
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genomic sequence for the Streptococcus agalactiae type V strain 2603 V/R available either at the 
TIGR website by August 28, 2002 at http://www,tigr-org or at the GenBank database at 
accession number AE009948. 

Table 13 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
5 SAG0466. An alignment of each of the sequences is also included. 

Table 14 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0471. An alignment of each of the sequences is also included. 

Table 15 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG0492. An alignment of each of the sequences is also included. 
10 Table 16 lists the polynucleotide sequences of the 11 strains relating to GBS ORF 

SAG0767, An alignment of each of the sequences is also included. 

Table 17 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1086. An alignment of each of the sequences is also included. 

Table 18 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
IS SAG1600. An alignment of each of the sequences is also included. 

Table 19 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1680. An alignment of each of the sequences is also included. 

Table 20 lists the polynucleotide sequences of the 1 1 strains relating to GBS ORF 
SAG1723. An alignment of each of the sequences is also included. 
20 Table 21 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 

GBS ORF S AG0079. An alignment of each of the sequences is also included. 

Table 22 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0093. An alignment of each of the sequences is also included. 

Table 23 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
25 GBS ORF SAGO 163. An alignment of each of the sequences is also included. 

Table 24 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG0290. An alignment of each of the sequences is also included. 

Table 25 lists the polynucleotide and polypeptide sequoices of the 1 1 strains relating to 
GBS ORF SAG0368. An alignment of each of the sequences is also included. 
30 Table 26 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 

GBS ORF SAG0503. An alignment of each of the sequences is also included. 

Table 27 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1473. An alignment of each of the sequences is also included. 
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Table 28 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1S52. An alignment of each of the sequences is also included. 

Table 29 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG1641. An alignment of each of the sequences is also included. 
5 Table 30 lists the polynucleotide and polypeptide sequences of tiie 1 1 strains relating to 

GBS ORF SAG2147. An alignment of each of the sequences is also included. 

Table 31 lists the polynucleotide and polypeptide sequences of the 1 1 strains relating to 
GBS ORF SAG2148. An alignment of each of the sequences is also included. 

Table 32 provides a conversion table for the ORFxxxx reference numbers to the 
10 SAGxxxx reference nxmibers. The SAGxxxx ORF mmiber corresponds to the genomic sequence 
for the Streptococcus agalactiae type V strain 2603 V/R available eith^ at the TIGR website by 
August 28, 2002 at http://www.tigr.org or at the GenBank database at accession number 
AE009948. 

Table 33 lists the 1006 GAS ORF's which are shared with GBS and Spn. The sequences 
1 5 corresponding to these ORFs were published in GenBank, Accession No. AAK33 146 (protein 
sequence). A link to the corresponding polynucleotide sequence is also available. The numbers 
for the GAS ORF refer directly to their GenBank entries. 

Table 34 lists the 212 GAS ORF's which are shared witii GBS but which do not have 
homologues with pneumococcus. The sequences corresponding to these ORFs were published in 
20 GenBank, Accession No. AAIC33 146 protein sequence). A link to the corresponding 

polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 35 lists the 62 GAS ORF's which have homologues with pneumococcus but which 
do not have homologues with GBS. The sequences corresponding to these ORFs were published 
25 in GotiBank, Accession No. AAK33146 (protein sequence). A link to the corresponding 

polynucleotide sequence is also available. The numbers for the GAS ORF refer directly to their 
GenBank entries. 

Table 36 lists the 1034 Spn ORF's which are shared with GBS and GAS. These ORF's 
were published in GenBank. The numbers for Spn correspond to the entry for AE005672. 
30 Table 37 lists the 1 95 Spn ORF's which are shared witii GBS but do not have 

homologues with GAS. These ORF's were published in GenBank. The numbers for Spn 
correspond to flie entry for AE005672. 
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Table 38 lists the 74 Spn ORF's which are shared with GAS but do not have homologues 
withGBS. These ORF's were published in GenBank. The numbers for Spn correspond to the 
entry for AE005672. 

Table 40 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
5 ORF SAG063S. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 41 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0649. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 42 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0764, An aligmnent of the polynucleotide and polypeptide sequences is also included. 
10 Table 43 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG0079. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 44 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0416. An aligmnent of the polynucleotide and polypeptide sequ^ces is also included. 

Table 45 lists the polynucleotide and polypeptide sequences of S strains relating to GBS 
15 ORF SAG1404. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 46 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1615. An aHgmnent of the polynucleotide and polypeptide sequences is also included. 

Table 47 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0739. An alignment of the polynucleotide and polypeptide sequences is also included. 
20 Table 48 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG1474. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 49 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1502. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 50 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
25 ORF SAG1024. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 51 lists the polynucleotide and polypeptide sequences of 7 strains relating to GBS 
ORF SAG0677. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 52 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGl 823. An aligmnent of the polynucleotide and polypeptide sequences is also included. 
30 Table 53 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG0755. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 54 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0949. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

15 
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Table 55 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF S AGl 592. An alignment of the polynucleotide and polypeptide sequences is also included 

Table 56 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0806. An alignment of the polynucleotide and polypeptide sequences is also included. 
5 Table 57 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1488. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 58 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 182. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 59 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
10 ORF SAG2147. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 60 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF S AGl 945. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 61 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1030. An aligmnent of the polynucleotide and polypeptide sequences is also included. 
1 5 Table 62 lists the polynucleotide and polypeptide sequences of 1 0 strains relating to GBS 

ORF SAG0690. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 63 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1912. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 64 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
20 ORF S AG0827. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 65 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0231. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 66 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0754. An alignment of the polynucleotide and polypeptide sequences is also included, 
25 Table 67 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0475. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 68 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0499. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 69 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG0032. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 70 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF S AGl 280. An alignment of tiie polynucleotide and polypeptide sequences is also included. 

Table 71 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1333. An alignment of the polynucleotide and polypeptide sequences is also included. 

16 
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Table 72 lists fhe polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0941 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 73 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG098L An alignment of the polynucleotide and polypeptide sequences is also included. 
S Table 74 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG1S72. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 75 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG067L An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 76 lists the polynucleotide aad polypeptide sequences of 10 strains relating to GBS 
10 ORF SAG0260. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 77 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG2059, An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 78 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1016. An aligmnent of the polynucleotide and polypeptide sequences is also included. 
15 Table 79 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG2150. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 80 lists the polynucleotide and polypeptide sequences of 2 strains relating to GBS 
ORF SAG1266. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 81 lists the polynucleotide and polypq>tide sequences of 10 strains relating to GBS 
20 ORF SAGOOl 1 . An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 82 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0165. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 83 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAGO 108. An alignment of tiie polynucleotide and polypeptide sequences is also included. 
25 Table 84 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 

ORF SAG0267. An alignment of the polynucleotide and polypeptide sequences is also included. 

Table 85 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG1361. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 86 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
30 ORF SAG1393. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 87 lists the polynucleotide and polypeptide sequences of 8 strains relating to GBS 
ORF SAG0645. An aligmnent of the polynucleotide and polypeptide sequences is also included. 

Table 88 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG0477. An alignment of the polynucleotide and polypeptide sequences is also included. 

17 
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Table 89 lists the polynucleotide and polypeptide sequences of 10 strains relating to GBS 
ORF SAG13S0. An alignment of the polynucleotide and polypeptide sequences is also included. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et al., PNAS (2002) 
5 29(19): 12391 - 12396 and online at www.pnas.org. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. A 
color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(1 9): 12391 - 12396 
andonlineatwww.pnas.org. 

Figure 3 depicts a phylogenetic tree of GBS strains based on PGR sequences. 
10 Figure 4 depicts a linear representation of the GBS genome. A color version of Figure 4 

can be found in the supporting information to Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 available online at www.pnas.org. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative . 
genome hybridisations. A color version of Figure 5 can be foimd in the supporting information 
15 to Tettelin et al., PNAS (2002) 99(19): 12391 - 12396 available onlme at www.pnas.org. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to polynucleotides which are conserved or specific to one or more 
species of Streptococcus, Streptococcus species serotypes, and/or serotype isolates. Li particular, 
20 the invention relates to polynucleotides firom Streptococcus which are conserved or specific to 
one or more of the species of S, pneumoniae (**pne\imococcus" or "S. pn."), S, pyogenes ("group 
A streptococcus" or "GAS"), and S. agalacttae ("group B streptococcus" or "GBS"). The 
invention further relates to polynucleotides which are conserved or specific to one or more 
Streptococcal species serotypes, such as GBS serotypes la, lb, U, HI, IV, V, VI, VII, and VHI. 
25 The invention still further relates to polynucleotides which are conserved or specific to one or 
more clinical isolates of a Streptococcus species. 

In order to facilitate an understanding of the invention, selected terms used in the 
appUcation will be discussed below. 

As used herein, the phrase " species of Streptococcus " generally refers to species of the 
30 Streptoccus family, including S.pneumoniae C*pneumococcus" or "S.pn."), S.pyogenes ('group A 
streptococcus' or *GAS') and S.agalactiae (*group B streptococcus' or 'GBS*). 

As used herein, the phrase " Streptococcus species serotypes " generally refers to 
subdivisions based on a distinguishing characteristic within a specific Streptococcus species. 
The distinguishing characteristic can be identified by any of a wide range of diagnostic tools. 
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For instance, GBS is generally recognized as comprising at least nine subdividing serotypes 
based on the structure of their polysaccharide capsule. 

As used herein, the phrases " serotype isolates " or " clinical isolates " generally refer to 
specific isolated bacterial strains of a specific Streptococcal species and serotype. 

5 As used herein in ref^ence to bacterial genomes, tiie phrases " conserved " or " shared " 

generally refer to genomic sequences which have homologues in the two or more genomes in the 
refereace. Homology references, as used in this application, are generally based on comparisons 
using FASTA3. See Pearson {200Q)Methods Mol Biol 132 185- 219. When the homology 
reference involves a comparison between genes in GBS, GAS or Spn, homologous or shared 

10 genes are typically defined by using a FAST A3 P value cutoff of 1 0"^^. Where the homology 
reference involves a comparison between GBS, GAS or Spn and all other completely sequenced 
genomes, homologous or shared genes are typically defined by using a FASTA3 P value cutoff 
of 10"^ or lower. 

As used herein in reference to bacterial genomes, the phrases "specific to" or "not shared" 

15 gmerally refer to genomic sequences which do not have homologues in the two or more 
genomes in the reference. 

Other sofiware programs to compare identity and to determine homology between 
nucleotide sequences are known in the art, for example those described in section 7.7.18 of 
Current Protocols in Molecular Biology (F.M. Ausubel etal.^ eds., 1987) Supplement 30. A 

20 preferred aligmnent program is GCG Gap (Genetics Computer Group, Wisconsin, Suite Version 
10.1), preferably using default parameters, which are as follows: open gap = 3; extend gap = 1. 

Sequences within a Subset of the invention include sequences which hybridize to the 
listed genes. Hybridization reactions can be performed under conditions of different 
"stringency. Conditions that increase stringency of a hybridization reaction of widely known 

25 and published in the art [e,g, page 7.52 of Sambrook et al (1989) Molecular Cloning: A 

Laboratory Manual NY, Cold Spring Harbor Laboratory]. Examples of relevant conditions 
include (in order of increasing stringency): incubation temperatures of 25*^C, 37°C, 50°C, 55**C 
and 68^C; buffer concratrations of 10 x SSC, 6 x SSC, 1 x SSC, 0.1 x SSC (where SSC is 
0. 1 5 M NaCl and 1 5 mM citrate buffer) and their equivalents using other buffer systems; 

30 formamide concentrations of 0%, 25%, 50%, and 75%; incubation times from 5 minutes to 24 
hours; 1, 2, or more washing steps; wash incubation times of 1, 2, or 15 minutes; and wash 
solutions of 6 X SSC, 1 x SSC, 0.1 x SSC, or de-ionized water. Hybridization techniques and 
their optimization are well known in tiie art [e.g. see Sambrook et al; RNA Methodologies 
(Farrell, 1998) (Academic Press; ISBN 0-12-249695-7); Current Protocols in Molecular Biology 



wo 2004/018646 



PCT/US2003/026827 



(F.M. Ausubel etal^ eds,, 1987) Supplement 30; Short protocols in molecular biology (4th 
edition, 1999) Ausubel et al eds. ISBN 0-471 -3293 8-X; US patent 5,707,829 etc^. 

Identity between polypeptide sequences can be determined using software programs 
known in the art, for example those described in section 7.7.18 of Current Protocols in 

5 Molecular Biology (F.M. Ausubel et aly eds., 1987) Supplement 30. A preferred alignment is 
determined by the Smith-Waterman homology search algorithm [Smith & Waterman (1981) Adv, 
AppL Math. 2: 482-489.] using an afGne gap search with a gap open penalty of 12 and a gap 
extension penalty of 2, BLOSUM matrix 62. 

Typically, 50% identity or more between two proteins may be considered to be an 

10 indication of functional equivalence. References to a percentage sequence identity between two 
amino acid sequences means that, when aligned, that percentage of amino acids are the same in 
comparing the two sequences. 

The terms '^polypeptide'', '' protein " and "amino acid sequence " as used herein generally 
refer to a polymer of andno acid residues and are not limited to a minimum length of the product. 

15 Thus, peptides, oligopeptides, dimers, mulimers, and the like, are included within the defiuitioiL 
Both full-length proteins and fragments thereof are encompassed by the definition. Minimum 
fragments of polypeptides useful in the invention can be al least 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 
14, 15, 18, 20, 25, 30, 35, 40 or 50 amino acids. Typically, polypeptides useful in this invention 
can have a maximum length suitable for the intended application. Generally, the maximum 

20 length is not critical and can easily be selected by one skilled in the art. 

Refermce to polypeptides and the like also includes derivatives of the amino add 
sequences of the invention. Such derivatives can include postexpression modifications of the 
polypeptide, for example, glycosylation, acetylation, phosphorylation, and the like. Amino acid 
derivatives can also include modifications to the native sequence, such as deletions, additions 

25 and substitutions (generally conservative in nature), so long as the protein maintains the desired 
activity. These modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors due to PGR 
amplification. Furthermore, modifications may be made that have one or more of the following 
effects: reducing toxicity; facilitating cell processing (e.g, secretion, antigen presentation, etc.); 

30 and facilitating presentation to B-cells and/or T-cells. 

A ' Vecombinanf * protein is a protein which has been prepared by recombinant DNA 
techniques as described herein. In general, the gene of interest is cloned and then expressed in 
transformed organisms, as described further below. The host organism expressed the foreign 

20 
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gene to produce the protein under expression conditions. The polypeptides of the invention mscy 
be prepared by recombinant means. 

The term " polynucleotide ", as known in the art, g^erally refers to a nucleic add 
molecule. A 'polynucleotide" can include both double- and single-stranded sequences and refers 

5 to, but is not limited to, cDNA from viral, prokaryotic or eukaryotic MKNA, genomic RNA and 
DNA sequences from viral (e.g. RNA and DNA viruses and retroviruses) or prokaryotic DNA, 
and especially synthetic DNA sequences. The term also captures sequences that include any of 
the known base analogs of DNA and RNA, and includes modifications such as deletions, 
additions and substitutions (generally conservative in nature), to the native sequence, so long as 

10 the nucleic acid molecule encodes a therapeutic or antigenic protein. These modifications may 
be deliberate, as through site-directed mutagenesis, or may be accidental, such as through 
mutations of hosts that produce the antigens. Modifications of polynucleotides may have any 
number of effects including, for example, facilitating expression of the polypeptide product in a 
host cell. 

15 The term '^polynucleotide" fiirther includes DNA, RNA, DNA/RNA hybrids, DNA and 

RNA analogues such as those containing modified backbones (with modifications in the sugar 
and/or phosphates e.g. phosphorothioates, phosphorantiidites etcX and also peptide nucleic acids 
(PNA) and any other polymer comprising purine and pyrimidine bases or other natural, 
chemically or biochemically modified, non-natural, or derivatized nucleotide bases etc. Nucleic 

20 acid according to the invention can be prepared in many ways (e.g, by chemical synthesis, from 
genomic or cDNA libraries, from the organism itself etc) and can take various forms {e.g. single 
stranded, double stranded, vectors, probes etc), 

A polynucleotide can encode a biologically active (e.g., immunogenic or therapeutic) 
protein or polypeptide. Depending on the nature of the polypeptide encoded by the 

25 polynucleotide, a polynucleotide can include as little as 10 nucleotides, e.g., where the 

polynucleotide encodes an antigen. The polynucleotides of the invention may comprise at least 
10, 13, 15, 18, 20, 22, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 80, 90 or 100 consecutive 
polynucleotides. 

By * Hsolated" is meant, when referring to a polynucleotide or a polypeptide, that the 
30 indicated molecule is separate and discrete from the whole organism with which the molecule is 
found in nature or, when the polynucleotide or polypeptide is not found in nature, is suf&dentiy 
free of other biological macromolecules so that the polynucleotide or polypeptide can be used for 
its intended pvupose. 

21 
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"Anybody*' as known in the art includes one or more biological moieties that, through 
chemical or physical means, can bind to or associate with an epitope of a polypeptide of interest 
The antibodies of the invention specifically bind to infectious prion conformations. The term 
"antibody" includes antibodies obtained from both polyclonal and monoclonal preparations, as 

5 well as the following: hybrid (chimeric) antibody molecules (see, for example, Winter et al. 
(1991) Nature 349: 293-299; and U-S. Patent No. 4,816,567; F(ab')2 and F(ab) fragments; Fy 
molecules (non-covalent heterodimers, see, for example, hibar et al. (1972) Proc Natl Acad Sci 
USA $9:2659-2662; and Ehrlich et al. (1980) Biochem 19:4091-4096); single-chain Fv molecules 
(sFv) (see, for example, Huston et al. (1988) Proc Natl Acad Sci USA 85:5897-5883); dimeric 

10 and trimeric antibody fragment constructs; minibodies (see, e.g.. Pack et al. (1992) Biochem 
31:1579-1584; Cumber et al. (1992) J Immunology 149B : 120-126); humanized antibody 
molecules (see, for example, Riechmann et al. (1988) Nature 332:323-327; Verhoeyan et al. 
(1988) Science 239:1534-1536; and U.K. Patent PubUcation No. GB 2,276,169, published 21 
September 1994); and, any functional fragments obtained from such molecules, wherein such 

1 5 fragments retain immunological binding properties of the parent antibody molecule. The term 
"antibody" further includes antibodies obtained through non-conventional processes, such as 
phage display. 

As used herein, the term * *monoclonal antibody' * refers to an antibody composition 
having a homogeneous antibody population. The terai is not lin^ited regarding the species or 
20 source of the antibody, nor is it intended to be limited by the manner in which it is made. Thus, 
the temi encompasses antibodies obtained from murine hybridomas, as well as human 
monoclonal antibodies obtained using human rather than murine hybridomas. See, e.g., Cote, et 
al. Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, 1985, p 77. 

An "immunog enic composition " as used herein refers to a composition that comprises an 
25 antigenic molecule where administration of the composition to a subject results in the 

development in the subject of a humoral and/or a cellular immune response to the antigenic 
molecule of interest. The inamunogenicity of the composition or the antigenicity of the molecule 
may be facilitated by the use of an adjuvant. 

The practice of the present invention will employ, unless otherwise indicated, 
30 conventional methods of chemistry, biochemistry, molecular biology, immunology and 

pharmacology, within the skill of the art. Such techniques are explained fully in the literature. 
See, e.g., Remington's Pharmaceutical Sciences, 18th Edition (Easton, Pennsylvania: Mack 
Publishing Company, 1990); Methods In Enzymology (S. Colowick and N. K^lan, eds.. 
Academic Press, Inc.); and Handbook of Experimental Immunology^ Vols. I-IV (D.M. Weir and 
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C.C. Blackwell, eds,, 1986, Blackwell Scientific Publications); Sambrook, et al.. Molecular 
Cloning: A Laboratory Manual (2nd Edition, 1989); Handbook of Surface and Colloidal 
Chemistry (Birdi, K.S. ed., CRC Press, 1997); Short Protocols in Molecular Biology^ 4tli ed. 
(Ausubel et al. eds., 1999, John Wiley & Sons); Molecular Biology Techniques: An Intensive 
5 Laboratory Course, (Ream et al., eds., 1998, Academic Press); PCR (Introduction to 

Biotechniques Series), 2nd ed. (Newton & Graham eds., 1997, Springer Verlag); Peters and 
Dahymple, Fields Virology (2d ed), Fields et al. (eds.), B.N. Raven Press, New York, NY. 

It is understood that the antibodies and methods of this invention are not limited to 
particular formulations or process parameters as such may, of course, vary. It is also to be 
10 xmderstood that the terminology used herein is for the purpose of describing particular 
embodiments of the invention only, and is not intended to be limiting. 

All publications, patents and patent applications cited herein are hereby incorporated by 
reference in their entirety. 

15 Vaccines w d TrnTmini sation 

The invention provides an irmnunogenic composition comprising a polypeptide, or a 
firagment thereof, which is encoded by a polynucleotide sequence which is conserved across one 
or more species of Streptococcus. 

The polynucleotide is preferably conserved across one or more species of Streptococcus 

20 selected fi^om the group consisting of GBS, GAS and pneumococcus. In one embodiment, the 
polynucleotide is a GBS polynucleotide which is homologous with at least one gene firom both 
GAS and pneumococcus. Preferably, the GBS polynucleotide is selected firom GBS Subset 1, 
which includes 1060 GBS genes which have homologues with both GAS and pneumococcus 
(Table 8). 

25 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous with at least one gene from both GBS and pneumococcus. Preferably, the GAS 
polynucleotide is selected from GAS Subset 1, which includes 1006 GAS genes which have 
homologues with both GBS and pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcal polynucleotide which is 

30 homologous with at least one gene both GAS and GBS. Preferably, the pneimiococcus 

polynucleotide is selected from Spn Subset 1, which includes 1034 pneumococcal genes which 
have homologous with both GBS and GAS. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 
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one of the genes listed GBS Subset 2, which includes 225 GBS genes which have hoxnologues 
with GAS, but not with pneumococcus. 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus. Preferably, the polynucleotide is 
5 selected from GBS Subset 3, which includes 176 GBS genes which have homologues with 
pneumococcus. 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
GAS Subset 2, which includes 212 GAS genes which have a homologue with GBS. 
10 In another embodiment, the polynucleotide is a GAS polynucleotide which is 

homologous with at least one gene from pneumoccus. Preferably, the polynucleotide is selected 
from GAS Subset 3, which includes 62 GAS genes which have a homologue with 
pneumococcus. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 

15 homologous with at least one gene from GBS. Preferably, the polynucleotide is selected from 
Spn Subset 2, which includes 195 Spn genes which have a homologue with GBS. 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GAS. Preferably, the polynucleotide is selected from 
Spn Subset 3, which includes 74 Spn genes which have a homologue with GAS. 

20 The invention frirther provides an immunogenic composition comprising a polypeptide, 

or a fragment tiiereof, which is encoded by a polynucleotide sequence which is specific to one or 
more species of Streptococcus. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide which is specific to GBS, GAS and 

25 pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pnevimococcus. Preferably, the GBS 
polynucleotide is selected from GBS Subset 1. In an altemative embodiment, the polynucleotide 
is a GBS polynucleotide which is homologous to at least one gene from botii GAS and 
pneumococcus, but which is not homologous to a gene in any other published bacterial genome 

30 at the time of the invention. Preferably, the GBS polynucleotide is selected from one of the 12 
GBS genes included in GBS Subset 1(a). (Table 3). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene in both GBS and pneumococcus. Prefi^ably, tiie GAS 
polynucleotide is selected from GAS Subset 1 . In another embodiment, the polynucleotide is a 
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GAS polynucleotide which is homologous to at least one gene in both GBS and pneumococcus 
but which is not homologous to any gene in any oiher published bacterial genome at tiie time of 
the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 1(a). 

Alternatively, the polynucleotide is a pneumoccus polynucleotide which is homologous 

5 to at least one gene in both GBS and GAS. Preferably, the pneumococcus polynucleotide is 
selected from Spn Subset 1(a). In another embodiment, the polynucleotide is a pneumoccus 
polynucleotide which is homologous to at least one gene in both GBS and GAS but which does 
not have a homologue in any other published bacterial genome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 1(a). 

10 The invention further provides an immunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS. 
In one embodiment, the polynucleotide is a GBS polynucleotide which is not homologue to a 
gene in either GAS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 683 GBS genes included in GBS Subset 4. In a frirther embodiment, the polynucleotide is 

15 a GBS polynucleotide which is not homologous to a gene in eilher GAS or pneumococcus or any 
other published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from one of the 315 GBS genes in GBS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GAS, 

20 In one embodiment, the polynucleotide is a GAS polynucleotide which is not homologous to a 
gene in either GBS or pneumococcus. Preferably, the GBS polynucleotide is selected from one 
of the 416 GAS genes included in GAS Subset 4. In a fiirther embodiment, the polynucleotide is 
a GAS polynucleotide which does not have a homologue in either GBS or pneumococcus or in 
any other published bacterial genome at the time of the invention. Preferably, the GAS 

25 polynucleotide is selected from GAS Subset 4(a). 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is needed by a polynucleotide sequence which is specific to 
pneumococcus. In one embodiment, the polynucleotide is a pneumococcus pol3mucleotide 
which is not homologous to a gene in either GBS or GAS. Preferably, the pneumococcus 

30 polynucleotide is selected from one of the 836 Spn genes included in Spn Subset 4. In a further 
embodiment, the polynucleotide is a pneumococcus polynucleotide which does not have a 
homologue in either GBS or GAS or in any other published bacterial genome at the time of the 
invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 4(a). 
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The invention further provides an inununogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and GAS. In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous 
to at least one gene from GAS but is not homologous to a gene from pneumococcus. Preferably, 
the GBS polynucleotide is selected from one of the 225 GBS genes included in GBS Subset 2. 
In another embodiment, the GBS polynucleotide is homologous to at least one gene from GAS 
but is not homologous to any gene from pneumococcus and does not have a homologue in any 
otiier published bacterial genome at the time of the invention. Preferably, the GBS 
polynucleotide is selected from GBS Subset 2(a). 

In another embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous to at least one gene from GBS but is not homologous to any gene from 
pneumococcus. Preferably, the GAS polynucleotide is selected from one of the 212 GAS genes 
included in GAS Subset 2. In another embodiment, the GAS polynucleotide is homologous to at 
least one gene from GBS but is not homologous to any gene from pneumococcus and does not 
have a homologous gene with any other published bacterial genome at the time of the invention. 
Preferably, the GAS polynucleotide is a selected from GAS Subset 2(a). 

The invention ftirfher provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to GBS 
and pneumococcus. In one embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene &om pneumococcus but is not homologous to any gene from 
GAS. Preferably, the GBS polynucleotide is selected from one of the 176 GBS genes included 
in GBS Subset 3. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologoxis with any GAS 
polynucleotide and does not have a homologous gene in any of the other published bacterial 
genomes at the time of the invention. Preferably, the GBS polynucleotide is selected from GBS 
Subset 3(a). 

In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 
homologous with at least one gene from GBS, but is not homologous with any gene from GAS. 
Preferably, the pneumoccous polynucleotide is selected from one of the 195 Spn genes included 
in Spn Subset 2. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous with at least one gene from GBS, but is not homologous with any gene 
from GAS and does not have a homologous gene in any other published bacterial genome at the 
time of the invention. Preferably, the pneumococcus polynucleotide is selected from Spn Subset 
3(a). 
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The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thweof which is encoded hy a polynucleotide sequence which is specific to GAS 
and pneumococcus. hi one embodiment, the polynucleotide is a GAS polynucleotide which is 
homologous with at least one gene from pneumococcus but is not homologous with any gene 
5 from GBS. Preferably, the GAS polynucleotide is selected from one of the 62 GAS genes 

included in GAS Subset 3. In another embodiment, the polynucleotide is a GAS polynucleotide 
which is homologous with at least one gene from pneumococcus but is not homologous with any 
gene from GBS and is not homologous with any gene of any published bacterial genome at the 
time of the invention. Preferably, the GAS polynucleotide is selected from GAS Subset 3(a). 

10 In another embodiment, the polynucleotide is a pneumococcus polynucleotide which is 

homologous with at least one GAS polynucleotide, but is not homologous with any GBS gene. 
Preferably, the pneumoccous polynucleotide is selected from one of the 74 Spn genes included in 
Spn Subset 3. In another embodiment, the polynucleotide is a pneumococcus polynucleotide 
which is homologous wilii at least one gene from GAS, but is not homologous with any gene 

1 5 from GBS or with a gene from any other published bacterial graome at the time of the invention. 
Preferably, the pneumococcus polynucleotide is selected from Spn Subset 3(a). 

The inv^tion further provides an inummogenic composition comprising a polypeptide, . 
or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 

20 Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 

pneimiococcus. More preferably, the polynucleotide is specific to one or more GBS serotypes 
selected from the group consisting of GBS serotype la, lb, II, m, IV, V, VI, VII and VIII. 

The invention further provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 

25 one or more Streptococcal species serotypes. Preferably, the polynucleotide is specific to a 
Streptococcal species serotype selected from the Streptococcal species GBS, GAS and 
pneumococcus. More preferable, the polynucleotide is conserved across one or more GBS 
serotypes selected from the group consisting of GBS serotype la, lb, n, m, IV, V, VI, VE and 

vm. 

30 The invention further provides an inomunogenic composition comprising a polypeptide, 

or a fragment thereof, which is encoded by a polynucleotide sequence which is specific to one or 
more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is specific to a 
Streptococcal species clinical isolate selected from the Streptococcal species GBS, GAS and 
pneumococcus* More preferably, the polynucleotide is specific to one or more GBS clinical 
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isolates selected from the clinical isolates identified in Table 5. Still more preferably, the 
polyoucleotide is specific to one or more GBS clinical isolates having one or more genes 
selected fiom the genes listed in Table 7, 

In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from botti GAS and pneumococcus and which varies among 
clinical isolates. In another embodiment, the polynucleotide is a GBS polynucleotide which is 
homologous to at least one gene from both GAS and pneumococcus and which is homologous 
with at least one gene from at least one of the clinical isolates identified in Table 5. In anofiier 
embodiment, the polynucleotide is a GBS polynucleotide which is homologous to at least one 
gene from both GAS and pneumococcus and which is homologous with at least one gene from 
each of the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from 
one of the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from GAS and is not homologous to any gene from pneumococcus and which 
varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from GAS and is not homologous to 
any geae from pneumococcus and which is homologous to at least one gene from at least one of 
the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from GAS and is not homologous to 
any gene from pneumococcus and which is homologous to at least one gene from each of the 
clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of tiie 
genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is homologous to 
at least one gene from pneumococcus and is not homologous to any gene from GAS and which 
varies among clinical isolates. In another embodiment, the polynucleotide is a GBS 
polynucleotide which is homologous to at least one gene from pneumococcus and is not 
homologous to any gene from GAS and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is homologous to at least one gene from pneumococcus and is not 
homologous to any gene from GAS and which is homologous to at least one gene from each of 
the clinical isolates identified in Table 5. Preferably, the polynucleotide is selected from one of 
the genes listed in Table 7. 

In one embodiment, the polynucleotide is a GBS polynucleotide which is not 
homologous to any gene from GAS or pneumococcus and which varies among clinical isolates. 
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In another embodiment, the polynucleotide is a GBS polynucleotide which is not homologous to 
any gene firom GAS or pneumococcus and which is homologous to at least one gene from at least 
one of the clinical isolates identified in Table 5. In another embodiment, the polynucleotide is a 
GBS polynucleotide which is not homologous to any gene from GAS or pneumococcus and 
which is homologous to at least one gene from each of the clinical isolates identified in Table 5. 
PrefCTably, the polynucleotide is selected from one of the genes listed in Table 7. 

The invention fijtfher provides an immunogenic composition comprising a polypeptide, 
or a fragment thereof, which is encoded by a polynucleotide sequence which is conserved across 
one or more clinical isolates of a Streptococcal species. Preferably, the polynucleotide is 
conserved across one or more Streptococcal clinical isolates selected from the Streptococcal 
species GBS, GAS and pneumococcus. More preferable, the polynucleotide is conserved across 
one or more GBS clinical isolates identified in Table 5. Still more preferably, the polynucleotide 
is conserved across one or more clinical isolates having one or more genes selected from the 
genes Usted in Table 7. 

The invention fiirther provides for an immunogenic composition comprising a 
polypeptide, or a firagment thereof encoded by a polynucleotide selected from one or more of the 
Subsets of the invention. Accordingly, the invention provides for an immunogenic composition 
comprising a polypeptide encoded by a polynucleotide selected from one or more of the 
following Subsists: GBS Subset 1, GBS Subset 2, GBS Subset 3, GBS Subset 4, GAS Subset 1, 
GAS Subset 2, GAS Subset 3, GAS Subset 4, Spn Subset 1 , Spn Subset 2, Spn Subset 3, Spn 
Subset 4, GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), GBS Subset 4(a), GAS Subset 
1(a), GAS Subset 2(a), GAS Subset 3(a), GAS Subset 4(a), Spn Subset 1(a), Spn Subset 2(a), 
Spn Subset 3(a), Spn Subset 4(a), GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), GBS 
Subset 4(b), GBS Subset 5, GBS Subset 6, GBS Subset 6(a), GBS Subset 7, GBS Subset 8, GBS 
Subset 9, GBS Subset 10, GBS Subset 11, GBS Subset 12, GBS Subset i2(a), GBS Subset 
12(b), GBS Subset 12(c), GBS Subset 12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 
12(g), GBS Subset 12(h), GBS Subset 12(i), GBS Subset 120), GBS Subset 12(k), GBS Subset 
12(1), GBS Subset 12(m), GBS Subset 12(n), GBS Subset 12(o), GBS Subset 13(a), GBS Subset 
13(b), GBS Subset 13(c), GBS Subset 13(d), GBS Subset 13(e), GBS Subset 13(f), GBS Subset 
13(g), GBS Subset 13(h), GBS Subset 13(i), GBS Subset 130), GBS Subset 13(k), GBS Subset 
13(1), GBS Subset 13(m), GBS Subset 13(n), GBS Subset 13(o), GBS Subset 13(p), GBS Subset 
13(q), GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 
14(d), GBS Subset 14(e), GBS Subset 14(^, GBS Subset 14(g), and GBS Subset 14(h). 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1, GBS Subset 2, GBS Subset 3, and GBS Subset 4. 

The invention provides for an immunog^c composition comprising a polypeptide, or a 
5 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GAS Subset 1, GAS Subset 2, GAS Subset 3, and GAS Subset 4. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: Spn Subset 1, Spn Subset 2, Spn Subset 3, and Spn Subset 4. 
10 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 1(a), GBS Subset 2(a), GBS Subset 3(a), and GBS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereoi^ encoded by a polynucleotide selected from one or more of the following 
15 Subsets: GAS Subset 1(a), GAS Subset 2(a), GAS Subset 3(a), and GAS Subset 4(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of tiie following 
Subsets: Spn Subset 1(a), Spn Subset 2(a), Spn Subset 3(a), and Spn Subset 4(a). 

The invention provides for an unmunogenic composition comprising a polypeptide, or a 
20 fragment thereoi^ encoded by a polynucleotide selected from one or more of the following . 
Subsets: GBS Subset 1(b), GBS Subset 2(b), GBS Subset 3(b), and GBS Subset 4(b), 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from GBS Subset 5. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
25 fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 6 and GBS Subset 6(a). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof^ encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 7. 

30 The invention provides for an immunogenic composition comprising a polypeptide, or a 

fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 8. 
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The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 9. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 10. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 11. 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereof, encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 12, GBS Subset 12(a), GBS Subset 12(b), GBS Subset 12(c), GBS Subset 
12(d), GBS Subset 12(e), GBS Subset 12(f), GBS Subset 12(g), GBS Subset 12(h), GBS Subset 
12(i), GBS Subset 120), GBS Subset 12(k), GBS Subset 12(1), GBS Subset 12(m), GBS Subset 
12(n), and GBS Subset 12(o). 

The invention provides for an immunogenic composition comprising a polypeptide, or a 
fragment thereol^ encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 13(a), GBS Subset 13(b), GBS Subset 13(c), GBS Subset 13(d), GBS 
Subset 13(e), GBS Subset 13(f), GBS Subset 13(g), GBS Subset 13(h), GBS Subset 13(i), GBS 
Subset 13G), GBS Subset 13(k:), GBS Subset 130), GBS Subset 13(m), GBS Subset 13(n), GBS 
Subset 13(o), GBS Subset 13(p), GBS Subset 13(q). 

The invention provides for an immunogenic composition comprising a polypeptide or a 
fragment thereof encoded by a polynucleotide selected from one or more of the following 
Subsets: GBS Subset 14, GBS Subset 14(a), GBS Subset 14(b), GBS Subset 14(c), GBS Subset 
14(d), GBS Subset 14(e), GBS Subset 14(f), GBS Subset 14(g), and GBS Subset 14(h). 

Each of the above-identified groups and subsets may be used to create immunogenic 
compositions comprising two or more Streptococcus polypeptides. The invention then provides 
for an immunogenic composition comprising a combination of Streptococcus polypeptides, said 
combination consisting of two, three, foxir, five, six, seven, eight, nine, or ten polypeptides 
selected from one of the groups identified above. Preferably, the combination consists of two, 
three, four or five polypeptides. Preferably, the polypeptides are all selected from the same 
group. Preferably, the polypeptides are selected from the same Subset described herein. The 
Streptococcus polypeptides are selected from GBS, GAS and pneumococcus. Preferably, all of 
the polypeptides in the combination are selected from the same species. 
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For example, the composition may comprise an combination of GBS polypeptides, said 
combination consisting of two, three, four, five, six, seven, eight, nine, or ten polypeptides, 
wherein each polypeptide is encoded by a GBS polynucleotide sequence which is homologous to 
a polynucleotide sequence of both GAS and pneumococcus. Preferably, the combination 

5 consists of two, three, four or five polypeptides. Preferably, the GBS polynucleotide sequences 
are selected firom GBS Subset 1 . 

As another example, the composition may comprise a combination of GBS polypeptides, 
said combination consisting of two, three, four or five polypeptides, wherein each polypeptide is 
encoded by a GBS polynucleotide sequence which is homologous to a polynucleotide sequence 

10 of GAS. Preferably, the GBS polynucleotide sequences are selected from GBS Subset 2. 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS polynucleotide sequence which is homologous to a polynucleotide sequence of 
Streptococcus pneumoniae. Preferably, the GBS polynucleotide sequences selected firom GBS 

15 Subset 3. 

The composition may comprise a combination of GBS polypeptides, said combination 
consisting of two, three, four or five polypeptides, wherein each polypeptide is encoded by a 
GBS serotype polynucleotide sequence which is homologous to at least one other GBS serotype. 
Preferably, the GBS polypeptides are encoded by GBS serotype polynucleotide sequences which 
20 are homologous to at least one other GBS serotype. 

The invention fiirther provides for an immunogenic composition comprising a 
polypeptide or a firagment thereof comprising a fiision protein encoded by one or more of the 
polynucleotides included in the Subsets of the invention. 

The invention fiirther provides a method for designing an immunogenic composition, 
25 such as a vaccine, by selecting one or more polypeptides encoded by a polynucleotide selected 
firom one or more of the Subsets of the invention. Preferably, tiie immunogenic compositions of 
the invention comprise at least two, three, four or five polypeptides encoded by polynucleotides 
within the same Subset. 

The invention provides a method for raising an immune response in a patient by 
30 administering any one of the inununogenic compositions set forth above. The choice of 

immunogenic composition means that the immune response may be reactive against all three of 
GAS, GBS and streptococcus, may be reactive against only two of the three, or may be reactive 
only against GBS. 
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Each of the immunogenic compositions described above may be prepared and 
administered instead as a polynucleotide where the polypeptide is expressed in vivo. 

The immune response is preferably an antibody response. It may be a protective immune 
response. The patient is preferably a human. 

The immunogenic compositions of the invention may further comprise an adjuvant, as 
discussed in further detail below. 

Essential genes and knockouts 

The invention provides a Streptococcus bacterium wherein one or more genes within any 
of the Subsets of this invention have been knocked out. The choice of Subset means that the 
knocked out gene may be, for instance, a gene found in GBS but not in GAS or pneiraiococcus 
{e,g, which is involved in the pathogenesis of GBS, but not in the pathogenesis of GAS or 
pneumococcus, such as binding GBS cellular targets). 

Techniques for producing knockout bacteria are well known, and knockout Streptococci 
of various species have been reported Margolis et al (2001) Antimicrob. Agents Chemother, 
45:2432-2435; Zhang et al. (2000) Cell 102:827-837; Nizet et al (2000) Infect Immun. 68:4245- 
4254; Nizet et al (1997) Adv. Exp. Med. Biol. 418:627-630; etc.]. 

The knockout mutation may be situated in the coding region of the gene or may lie within 
its transcriptional control regions (e.g. within its promoter). 

The knockout mutation will reduce the level of naRNA encoding the corresponding 
polypeptide to <1% of that produced by the wild-type bacterium, preferably <0.5%, more 
preferably <0.1%, and most preferably to 0%. 

The knockout mutants of the invention maybe used as immxmogenic compositions (e.g. 
as vaccines) to prevent streptococcal infection. Such a vaccine may include the mutant as a live 
attenuated bacterium. 

The knockout mutants of the invention may be used to determine whether genes are 
essential for bacterial survival, either under normal or stress conditions. 

Antisense 

The invention provides a single-stranded nucleic add comprising a fragment ofxi or 
more nucleotides from a nucleotide sequence selected from one of the Subsets of the invention. 
The choice of group means that the nucleic acid may be complementary to a gene sequence 
found in GBS, GAS and pneumococcus, or a gene sequence specific to GBS. 
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The single-stranded nucleic acid is at least xj nucleotides long. The value of xj is at least 
7 ie.g. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45) 46, 47, 48, 49, 50 etc.). The single-stranded 
nucleic acid may be at most X2 nucleotides long, wherein X2 is 100 or less (e.g, 99, 98, 97, 96, 95, 
94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 
68, 67, 66, 65, 64, 63, 62, 61, 60). 

The nucleic add is preferably of the formula 5'-(N)d-(XHN)i,--3', wherein 0>a>15, 
0>b>l 5, N is any nucleotide, and X is the fragment as defined above. The values of a and b may 
independently be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 or 15. Each individual nucleotide N 
in the -(N)^- and -(N)b- portions of the nucleic acid may be the same or different. The length of 
the nucleic acid a+fc+jcy) is preferably X2 or less. 

Antisense inhibition of streptococcal gene expression is known e,g. Sato et al (1998) 
FEMS Microbiol Lett 159:241-245. Antibacterial antisense techniques are also disclosed in 
international patent applications WO99/02673 and W099/13893. 

The single-stranded nucleic acid may reduce the level of polypeptide expression from the 
complementary gene to <1% of that produced by the wild-type bacterium, preferably <0.5%, 
more preferably <0.1%, and most preferably to 0%. 

Antisense experimmts may be used to determine whether genes are essential for bacterial 
survival, either under normal or stress conditions. 

Screening methods 

The invention provides a method for screening compoimds, wherein the method involves 
contacting the compounds with a polypeptide expressed by one or more of the polynucleotides 
selected from one of the Subsets of the invention. The method may be for screening for agonists 
of the polypeptides, antagonists, antibiotics etc. The choice of group means, for instance, that 
the method may be used for identifying an antibiotic with broad anti-streptococcal activity could 
be identified, or for identifying an antibiotic specific to GBS. 

Potential compounds for screening include small organic molecules, peptides, peptoids, 
polypeptides, lipids, metals, nucleotides, nucleosides, aptamers, polyamines, antibodies, and 
derivatives thereof. Small organic molecules have a molecular weight between 50 and about 
2,500 daltons, and most preferably in the range 200-800 daltons. Complex mixtures of 
substances, such as extracts containing natural products, compound libraries or the products of 
mixed combinatorial syntheses also contain potential antagonists. 
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Typically, a polypeptide is incubated with a test compound, and the mixture is then tested 
to see if the polypeptide and test compound interact, or to see if the polypeptide's activity is 
inhibited. 

For preferred high-throughput screening methods, all the biochemical steps for this assay 
are performed in a single solution in, for instance, a test tube or microtitre plate, and the test 
compounds are analysed initially at a single compound concentration. For the purposes of high 
throughput screening, the experimental conditions are adjusted to achieve a proportion of test 
compounds identified as ''positive" compounds from amongst the total compounds screened. 

The invention also provides a compovind identified using these methods. These can be 
used to treat or prevent streptococcal infection. The compound preferably has an affinity for the 
adhesion-specific protein of at least 10"^ M e.g. 10"^ M, 10'^ M, 10"^^ M or tighter. 

Distinguishing Streptococcal species 

The invention provides a method for determining whether a Streptococcus bacterium of 
interest is or is not in the species agalactiae, pyogenes or pneumoiae, comprising the step(s) of: 
(a) contacting the bacterium with a nucleic acid probe comprising the sequence of a gene 
selected firom one of the Subsets of the invention; and/or (b) contacting the bacterium with aii 
antibody which binds to a polypeptide encoded by one or more of the polynucleotides of one or 
more of the Subsets of the invention. The choice of group means, for instance, that the method 
maybe used for distinguishing GBS from GAS and from pneumococcus, or for confirming that a 
bacterium is not a GAS or pneumococcus. 

The method will typically include the further step of detecting the presence or absence of 
an interaction between the bacterium of interest and the nucleic acid or protein. 

The bacterium of interest may be in a cell culture, for example, or may be within a 
biological sample believed or known to contain a streptococcus. It may be intact or may be, for 
instance, lysed. 

The term '^biological sample" encompasses a variety of sample types obtained from an 
organism and can be used in a diagnostic or monitoring assay. The term encompasses blood and 
other liquid samples of biological origin, solid tissue samples, such as a biopsy specimen or 
tissue cultures or cells derived therefrom and the progeny thereof. The term encompasses 
samples that have been manipulated in any way after their procurement, such as by treatment 
with reagents, solubilization, or enrichment for certain components. The term encompasses a 
clinical sample, and also includes cells in cell culture, cell supematants, cell lysates, serum, 
plasma, biological fluids, and tissue samples. 
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GBS 2603 Type V Genomic Seaueace 

Applicants have sequenced the complete genome sequence of GBS clinical type V isolate 
2603 V/R and perfomied comparative analyses comparing this sequence with oth» GBS strains, 
with other species of pathogenic Streptococci and with other known bacterial species. The entire 
genomic sequence is available by August 26, 2002 at http://www,tig.org . This genomic 
sequence is incorporated herein by reference in its entirety. The genomic sequence of GBS type 
V isolate 2603 V/R is also set forth in Intemational Patent Application WO 02/34771. 

In one embodiment, the invention relates to the polynucleotides, and fragments and 
derivatives thereof, set forth in the GBS clinical type V isolate 2603 published genome which are 
not disclosed within WO 02/34771 . The invention further relates to polypeptides expressed by 
the polynucleotides of the invention. 

Applicants have predicted that the GBS 2603 isolate contains approximately 2,176 
predicted genes. Each predicted gene is set forth in Table 1, listed by a SAGxxxx ORF number. 
Table 1 also includes the predicted anoino acid size of the predicted expressed protein and the 
predicted function, if known. The sequence of each SAG reference can be obtained at the TIGR 
website. 

Figure 1 is a circular representation of the GBS genome and comparative hybridisations 
using microarrays. A color version of Figure 1 can be found in Tettelin et al., PNAS (2002) 
99(19): 12391 - 12396 and online at www.pnas.org . The outer circle represents predicted 
coding regions on the plus strand color coded by role categories: violet indicating amino add 
biosynthesis; light blue indicating biosynthesis of cofactors, prosthetic groups, and carriers; light 
green indicating cell envelope; red indicating cellular processes; brown indicating central 
intermediary metabolism; yellow indicating DNA metabolism; light gray indicating energy 
metaboUsm; magenta indicating fatty acid and phospholipid metabolism; pink indicating protein 
synthesis and fate; orange indicating purines, pyrimidines, nucleosides, and nucleotides; olive 
indicating regulatory functions and signal transduction; dark green indicating transcription; teal 
indicating transport and binding proteins; gray indicating unknown function; salmon indicating 
other categories; blue indicating hypothetical proteins. 

The second circle represents predicted coding regions on the ndnus strand. In the third 
circle, black represents atypical nucleotide composition curve; green represents most atypical 
regions; magenta represents insertion elements; red diamonds indicate rRNAs. 

Circles 4-22 represent comparative hybridisations of strain 2603 V/R with 19 GBS 
strains. Cy3/Cy5 (2603 V/R signal/test strain) ratio cutoffs were defined arbitrarily as Cy3/Cy5 
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- 1.0 - 3.0, the gaie was present in the test strain, no color was added; Cy3/Cy5 = 3.0 - 10.0, 
ambiguous result (blue); Cy3/Cy5 > 10, gene absent in test strain (red). 

Circles 4-9 represent type la strains 090, 515, A909, Davis, and DK8. Circles 10-11 
represent type lb strains S7 7357b and H36B. Circles 12 - 13 represent type H strains 18RS21 
andDK21. Circles 14 -18 represent type inCOHl,COH31, D136C,M732 and M781. Circle 
19 r^esents type V strain CJBl 1 1. Circles 20 - 21 repres^t type Vm strains SMU014 and 
JM9130013. Circle 22 represents nontypable (NT) strain CJBl 10. Throu^out Figure 1, 
varying regions of five or more consecutive genes are indicated by yellow bullets. 

Figure 4 depicts a linear representation of the GBS genome. The location of predicted 
coding regions color-coded by biological role (see Figure 1) is displayed. Arrowed boxes 
represent the direction of transcription for each ORF. The number of membrane-spanning 
domains predicted by TopPred is displayed as lipid bi-layers on top of ORFs, only for those 
whose products have five or more predicted membrane spanning regions. Genes coding for 
rKNAs (16S, 23S, 5S) and tRNAs (clover leaf structure with number of genes) are indicated. 
Predicted Rho-indqpendent transcriptional terminators are represented by hairpins. 

ORF's were predicted by GLIMMmi (See, Delcher, et al., (1999) Nucleic Adds Res. 27, 
4636 - 4641 and Salzberg, et al., (1998) Nucleic Adds Res. 26, 544-548) trained with ORFs 
larger than 600 base pairs from the genomic sequence and GBS genes available in GenBank. All 
predicted proteins larger than 30 amino adds were searched against a nonredundant protein 
database. (See Fleischmann, et al., (1995) Science 269, 496 - 512). Frame-shifts and point 
mutations were detected and corrected where appropriate; those remaining were aimotated as 
"authentic fimne-shift" or "authentic point mutation". Protein membrane-spanning domains 
were identified by TOPPRED (See Claros, et al., (1994) Comput. Appl Biosd. 10, 685 - 686). 
Candidate lipoprotein signal peptides (See Hayashi et al., (1990) J. Bioenerg, Biomembr. 22, 451 
- 471) were flagged by N-terminal exact matches to the pattern {DERK} (6)-[LlVMFWSTAG] 
(2)-[LrVMFYSTAGCQ] - [AGS] - C. Putative signal peptides were idaitified by using 
SIGNALP (Nielsem, et al., (1997) Protein Eng. 10, 1 - 6). Two sets of hidden Markov models 
were used to determine ORF membership in families and superfamilies: PFAM Ver. 5.5 
(Bateman, et al., (2000) Nucldc Adds Res. 28, 263 - 266) and TIGRFAMS 1 .0 (Haft et al., 
(2001) Nucleic Acids Res. 29, 41 - 43). Domain-based paralo^us ftimilies were built by 
performing aU-versus-all seardies on the protein sequences by using a modified version of a 
previously described method. (Nieimann, et al., (2001) Proc. Natl. Acad. Sd. USA 98, 4136 - 
4141) Potential lineage-specific gene duplications were estimated by identification of OFRs 
more similar to ORFs within the GBS genome than to ORFs fixMn other complete genomes. All 
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ORFs were searched with FASTA3 (Pearson (2000) Methods Mol Biol 132, 185 - 219) against 
all ORF's from the complete genomes and matches with a FASTA P value of 10'^^ were 
considered significant 

The genome consists of a circular chromosome of 2,160^66 base pairs with a G+C 
content of 35.7%. Base pair one of the chromosome was assigned within the putative origin of 
replication. The genome contains 80 tRNAs, 7rRNAs, and 3 sRNAs. Approximately 78% of the 
2,176 predicted genes are transcribed in the same direction as that of DNA rq)lication, a feature 
also observed in S. pn. and other low-GC Gram positive organisms. 

Biological roles were assigned to 1,409 (65%) of the genome according to a classification 
scheme adapted from Riley (1993) Microbiol Rev. 57, 862 - 952. Another 527 predicted 
proteins (24%) matched proteins of unknown ftmction, and the remaining 240 (11%) had no 
database match. The expression of 50 of these hypothetical proteins was confirmed by Westem 
Blot analysis, and the proteins were amotated as "proteins of unknown ftmction." A total of 339 
paralogous protein families w^e identified in strain 2603, containing 941 predicted proteins 
(43% of the total). 

The Westem Blot analysis was conducted as follows. GBS strain 2603 V/R cells were 
grown in Todd-Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifiiged for 20 
minutes at 5,000 rpm. The supematant was discarded, and bacteria were washed once with PBS,, 
resuspended in 2 ml of 50 mM Tris-HCl pH 6.8, contahiing 400 units of Mutanolysin (Sigma), 
and incubated 2 hours at 37*^0. After three cycles of freeze and thaw, cellular debris was 
removed by centrifiigation at 14,000 rpm for 10 minutes, and the protein concentration of the 
svq)ematant was measured by the Bio-Rad Protein assay, with BS A as a standard. Purified 
recombinant proteins (50 ng) and total cell extracts (25 |J.g) derived from GBS serotype V 2603 
V/R strain were separated by SDS/PADE and electroblotted onto nitrocellulose membranes for 1 
hour at 100 V. The membranes were saturated by overnight incubation at 4*^ C in 5% skimmed 
milk and 0. 1% Tween 20 in PBS and incubated for 1 hour at room temperature with sera from 
immunized mice diluted 1 :500 - 1 : 1,000 in saturation buffer. To reduce background due to 
antibodies raised against contaminating E. coll proteins, sera wctc preincubated with E, coli 
protein extracts absorbed on nitrocellulose strips. The membranes were washed twice in 3% 
skimmed milV and 0.1% Tween 20 in PBS and incubated for 1 hour witii a 1 :1,000 dilution of 
horseradish peroxidase-conjugated antimouse Ig (DAKO). After washing with 0.1% Tween 20 
in PBS, the membranes were developed with the Opti-4CN Substrate Kit (Bio-Rad). 
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Table 2 cx)mprises a list of predicted and experimentally characterized surface and 
secreted proteins from GBS. Candidate signal peptides and lipoprotein motifs were predicted 
with PSORT [Nakai, K. & Horton, P. (1999) Trends Biochem Sci 24, 34-6] and other methods 
(see methods), sortase motifs (LPxTG) were detected using the FINDPATTERNS program of 
the GCG Package [Devereux, J., Haeberli, P. & Smithies, O. (1984) Nucleic Acids Res 12, 387- 
95] and hidden Markov models. Column "Other" indicates proteins carrying other motife ie.g. 
integrin-binding motif RGD) or are similar to characterized surface-exposed proteins. Westem 
blot results were considered positive when the antibodies revealed a predominant band of the 
expected molecular weight on the total protein extracts of S. agalactiae strain 2603 V/R, ORFs 
without + or - in this column were not tested in westem blot. FACS analyses were performed 
for westem blot positive proteins only. Westem blot and FACS data are displayed only for 
proteins carrying at least one of the other motifs shown in the table. Column "GBS specific" 
indicates genes unique to S. agalactiae (when compared to other completely sequenced 
genomes) that are present in all the S, agalactiae strains tested in comparative genome 
hybridization analyses. Finally, only proteins carrying less than 3 predicted transmembrane 
domains are shown in the table, other proteins are likely to be embedded in the cytoplasmic 
membrane and are probably not exposed on the organism's surface: 

FACS data was collected as follows: GBS 2603 V/R strain cells were grown in Todd- 
Hewitt broth (Difco) to OD600nm = 0.5. The culture was centrifixged for 20 minutes at 5,000 
rpm, and bacteria were washed once with PBS, resuspended in PBS containing 0.05% 
paraformaldehyde, and incubated for 1 hour at 37''C and flien overnight at 4^C. Fifty microliters 
of fixed bacteria (OD600mn 0.1) was washed once with PBS, resuspended in 20 [il of newborn 
calf serum (Sigma), and incubated for 1 hour at 4°C in 100|il of preimmune or imutnime sera and 
diluted 1 :200 in dilution buffer (PBS, 20% newborn calf serum, 0. 1 % BS A). After 
centrifixgation and washing with 200p.l of washing buffer (0.1% BSA in PBS), samples were 
incubated for 1 hour at 4*'C with 50 ^1 of R-phycoerythrin-conjugated F(ab)2 goat anti-mouse 
IgG (Jackson ImmunoResearch) diluted 1 :100 in dilution buffer. Cells were washed with 200 pi 
of washing buffer and resuspended in 200 \il of PBS. Samples were analysed by using a FACS 
calibur apparatus (Becton Dickinson), and data wore analyzed by using CELL QUEST (Becton 
Dickinson). A shift in mean fluorescence intensity of >75 chaimels compared with preimmune 
sera firom the same mice was considered positive. This cutoff was detemiined from the mean 
plus two standard deviations of shifts obtained with control sera raised agai n st mock purified 
recombinant proteins firom cultures ofE. coli carrying the empty expression vector and included 
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in every experiment Artifacts due to bacterial lysis were excluded by using antisera raised 
against six different known cytoplasmic proteins, all of which gave negative results. 

Regions of Atypical Nucleotide Composition. 

These regions were identified by the x^ analysis: the distribution of all 64 trinucleotides 
(3 mers) was computed for the complete genome in all six reading firames, followed by the 3-mer 
distribution in 2,000-bp windows. Windows overlapped by 1,000 bp. For each window, the x^ 
statistic on the difference between its 3-mer content, and that of the whole genome was 
computed. 

In Silico Genome Comparisons 

The protein sets of S. agalactiae, Streptococcus pneumoniae and S. pyogenes were 
compared by using FASTA3. A general description of the FASTAS sequence comparison 
program is discussed in Pearson, W.R., "Flexible Sequence Similarity Searching with the 
FASTAS Program Package", (2000) Methods Mol Biol, 132: 185-219. Shared genes were 
defined using a FASTA3 P value cutoff of 10"^^. These shared genes and genes that S. agalactia© 
did not share with the other streptococci using this cutoff were subsequently searched against all 
completely sequenced genomes, and genes were defined as unique to streptococci or S. 
agalactiae when they did not share similarity with any other gene sets with a F ASTA3 P value of 
10'^ or lower. The use of two cutoffi provides for a more stringent analysis of shared or unique 
genes. 

Figure 2 is a schematic representation of in silico comparisons between streptococci. The 
protein sets of GBS, S. pn., and GAS were compared by using FASTA3. Numbers under the 
species name indicate genes that are not shared with the other species; values in parenthesis are 
the nimiber of proteins in each species (excluding firame-shifted and degenerated genes). 
Numbers in the intersections indicate genes shared by two or three species. These are displayed 
in the color corresponding to the species used as the query. (GBS: green; S.pn.: blue; GAS: 
red. A color version of Figure 2 can be found in Tettelin et al., PNAS (2002) 99(19): 12391 - 
12396 and online at www.pnas.org .\ Numbers in any given intersection are slightly different 
due to g^e duplications in some species. 

Table 3 lists genes which were shared among GBS, GAS and pneumococcus, but which 
were not found in any of the other completely sequraced genomes. The protein sets of 
S. agalactiae, S. pneumoniae^ and 5. pyogenes were compared using FASTA3 [Pearson, W. R. 
(2000) Methods Mol Biol 132, 185-219]. Shared genes were defined using a FASTA3 p value 
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cutoff of 10"^^. These shared genes and genes that S. agalactiae did not share with the other 
streptococci using this cutoff were subsequently searched against all completely sequenced 
genomes and genes were defined as unique to streptococci or S. agalactiae when they did not 
share shnilarity with any other gene sets with a F ASTA3 p value of 1 0"^ or lower. 

Svntenv 

Regions of conservation of gene synteny were computed as windows of 10 kb spanning 
at least three genes whose order was conserved in the other species. Regions were merged if 
they were less than 20 kb apart. The number of genes within each broad region was then 
calculated. 

Comparative Genome Hybridizations 

Comparative genome hybridizations (See Figure 1) using DNA microarrays were 
performed between the sequenced type V strain 2603 V/R and 19 other GBS strains of multiple 
serotypes (See Table %). Predicted genes from strain 2603 V/R were amplified by PGR and 
arrayed on glass microscope slides. See Peterson, et al., (2000) J. Bacteriol 182, 6192-6202. 
Genomic DNA was labelled according to protocols provided by J. DeRisi 
fwww.microarravs.org/Pdfe/Genoroic-DNALabel B.pdft , except that the DNA was not digested 
or sheared before labeUing. Arrays were scanned wilh a GENEPK 4000B scanner (Axon 
Instruments, Foster City, CA), and individual hybridisation signals were quantitated with TIGR 
SPOTFINDER. See Hedge, et al., (2000), ^lOtecAm^rMes 29, 548-550, 552-554, 556. Cy3/Cy5 
(2603 V/R signal/test strain) ratio cutofife were defined arbitrarily as Cy3/Cy5 = 1.0-3.0, gene 
present in test strain; 3.0 - 10.0, ambiguous result; >10.0, gene absent. For ambiguoiis results, 
the gene may be divergent in the test strain relative to 2603 V/R, or the gene may be absent in 
the test strain but still produces paralogoxxs gene family or a repetitive elemtn. Although cutoffs 
are arbitrary, they fit nicely the results for the variation of the capsule locus in the strains tested 
(see region 9 on Figure 1) where most genes are slightly divergent and only a few are completely 
different. 

The CGH detected 1,698 genes in all of the strains, whereas 401 genes from strain 2603 
V/R (18% of the gene complement) were not detected in at least one other strain, suggesting that 
they are absent or significantly divergent in those strains. Two hundred sixty (38%) of the 683 
genes specific to S, agalactiae when compared with the other two streptococci (Fig. 2), including 
virulence determinants and surface proteins, vary among S. agalactiae strains, whereas only 47 
(4%) of the genes common to all three streptococcal species, including 5 of the 6 sortases 
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identified in the genome, vary anaong strains. Thus, the in silico analysis of genes shared by the 
streptococci tiiat are not expected to vary among this genus is consistent with the CGH analysis. 
Forty-four (25%) of the genes shared by S. agalactiae and S. pneumoniae and 44 (20%) of those 
shared by S. agalactiae and S. pyogenes vary in the CGH analysis. The first set contains many 
glycosyl transferases and proteins carrying a cell-wall anchor, whereas the second set displays 
many phage-related genes. One hundred thirty-six of the 3 1 5 genes unique to S. agalactiae 
when compared with all sequmced genomes vary among strains. These include R5, three 
capsular genes, two cell wall-anchored proteins, and three transcriptional regulators. Three 
himdred sixty-four (91%) of the 401 varying genes correspond to 15 regions containing more 
than 5 contiguous genes. Ten of these regions display an atypical nucleotide composition in 
strain 2603 V/R (Fig. 1), consistent with the possibility that they were horizontally transferred 
into this strainu Two of the largest regions (region 4, a prophage and region 7, similar to Tn916 
fi:om Enterococcusfaecalis) are flanked by insertion sequence elements. The 15 regions contain 
many proteins predicted to be anchored on the cell wall or surface exposed, including Rib 
(region 3), sortases, glycosyl transferases, the capsule locus (region 9, divergent in all strains but 
the other type V strain CJBl 1 1), and phage-related genes. Region 14 is unique to S. agalactiae 
and spans 33 genes (SAG1989- SAG2021), including 25 proteins of unknown function, some of 
which carry a cell-wall anchor. It is flanked by an ISL3 transposase and displays an atypical 
nucleotide composition. Region 1, unique to S. agalactiae, is a possible plasmid or remnant of a 
phage (SAG021 8-SAG0238), contains mostiy hypothetical proteins, and is flanked by a site- 
specific recombinase. Region 8 is specific to S, agalactiae, comprises 20 proteins of unknown 
fimction (SAG1018- SAG1037), most of which are predicted to be membrane associated or 
secreted, and displays an atypical nucleotide composition. 

The CGHresults were analyzed by profile clustering where genes are grouped based oil 
their distribution patterns (Fig, 5). Sixteen clusters of five or more contiguous and 
noncontiguous genes comprising a total of 300 genes were identified (Table 6). Several clusters 
correspond to regions of contiguous genes described above. Some clusters of genes that do not 
share sequence similarity and are located at different loci in the genome display an identical 
profile. For instance, a cluster of genes containing a surface antigen (SAG0674-SAG0681) 
follows the same distribution as another cluster containing only hypothetical proteins (S AG0247- 
SAG0249). A putative pathogenicity protein (SAG2063) also clusters with a region containing 
several glycosyl transferases and Sec proteins (SAG1447-SAG1462). 
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Profile clustering was also used to group strains based on similarity of gene content (Fig. 
5). In addition, the sequences of 19 genes firom each of 1 1 & agalactiae strains were determined 
after PGR amplification and used for phylogenetic analyses. The strains were the following: type 
la, 090 and A909; type lb, H36B; type II, 18RS21; type m, COHl, M732 and M781; type V, 
2603 V/R and 1 169NT1; type Vm, JM9130013; and nontypeable strain CJBl 10. The set 
comprised 8 housekeeping geaes and 1 1 genes coding for proteins predicted to be surface- 
exposed (Table 7). 

The profile clustering was conducted as follows. The information and absence of genes 
based on the comparative genome hybridisation results was used to group genes based on their 
distribution pattems. The analysis used was essentially identical to that used for phylogenetic 
profile analysis. See Pellegrinie, et al., (1999) Proc. Natl Acad, Sci. USA 96, 4285 - 4288. 
Each gene was assigned a binary profile based on its presence or absence across the different 
strains, with presence determined by a Cy3/Cy5 ratio < 3.0 and absence > 3.0. The gene profiles 
were then clustered by using the single-linkage clustering algorithm with column weighting (all 
with default settings;^ of CLUSTER nittp://rana.lbl.govV The CLUSTER program also groups 
tiie strains (columns) based on similarity of gene profiles. Clusters of genes and strains were 
viewed by using TREEVIEW nittp://rana.lbl.gov\ 

Phylogenetic trees were inferred for the complete set of 19 genes and for the subsets of 
housekeeping and surface-exposed genes. Because the branching pattans in all three trees were 
identical, only the tree of the 19 genes is shown in Fig. 3. The degree of polymorphism of the 
housekeeping and the surface-exposed genes is similar (--1 variable site among all of the strains 
per 100 bp). 

The sequences of genes firom the different strains were aligned by using CLUSTALW 
(See Thompson (1994), Nucleic Acids Res, 22, 4673 - 4680.) and trimmed to remove 
ambiguously ahgned regions. Phylognetic trees of individual genes and of concatenated 
alignments of multiple genes were inferred by using maximum likelihood methods of PAUP* 
4.0 blO (Sinauer, Sunderland, MA). Bootstrap analysis was carried out using PAUP* as well. 
The possibility of recombination among strains was examined by using analysis of sequence 
variation using SIMPLOT (S.C. Ray) and analysis of phylogenetic heterogeneity by using 
MACCLADE (Sinauer). 

Analysis of this variation showed no evidence for major recombination events between 
the strains. There were no long stretches of polymorphic sites that strongly supported other trees 
(analysis with MACCLADE), and there were no significant carossover events in plots of sequence 
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siinilaiity between strains (analysis with SIMPLOT). Some strain groupings (clades) generated 
by phylog«ietic analysis were similar to clusters from the profile analysis (type m strains M781, 
M732 and COHl; type la strain 090 and non^able strain CJBl 10), whereas others were 
different, possibly because of the aforementioned problems with the profile clustering. In both 
the phylogenetic analysis and the profile clustering, there is serotypedependent and -independent 
clustering (Figs. 3 and 5). The presence of strains of the same serotype in different clades or 
clusters could be due to lateral gene transfer. 

Figure 5 demonstrates phylogenetic profiling of GBS strains based on comparative 
gaiome hybridisations. The information on presence and absence of genes based on the 
microairay comparative genome hybridization results was used for phylogenetic profile analysis. 
The presence of a particular gene or gene cluster is indicated in the figure by a red square and the 
absence of a gene or cluster by a black square. The relationship between strains based on this 
analysis is depicted by the tree at the top of the figure. The strains and their serotypes are 
indicated (NT: nontypeable). Clusters with identical profiles are reduced to a single horizontal 
line and the number of genes in each cluster is indicated on the right. The clusters of 5 or more 
genes, labeled in red text and numbered, are Usted in Table 6. The 1698 genes shared by all 19 
strains are labeled in gre^ text 

Figure 3 depicts a phylogenetic tree of GBS strains based on PGR sequences. The 
sequences of 19 genes (Table 7) from each of 1 1 GBS strains were aligned and trimmed to 
remove ambiguously aUgned regions, and phylogenetic trees were inferred. Strain names are 
indicated in bold, and serotypes are indicated under the strain names. Bootstrap values are 
indicated on the branches. 

Techniques 

A suirmary of standard techniques and procedures which may be employed in order to 
perform the invention (e.g. to utilise flie disclosed sequences for vaccination or diagnostic 
purposes) follows. This summary is not a limitation on the invention, but gives examples that 
may be used, but are not req\iired. 

General 

The practice of the present invention will employ, unless o&erwise indicated, conventional techniques of 
molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. 
Such techniques are explained folly in the literature eg. Sambrook Molecular Cloning: A Laboratory 
Manual, Second Edition (1989) or Third Edition (2000); DNA Cloning. Volumes landU (pN Glover ed. 
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1985); Oligonucleotide Synthesis (MJ. Gait ed, 1984); Nucleic Acid Hybridization (B.D. Hames & SJ. 
Higgins eds- 1984); Transcription and Translation (B.D. Hames & SJ. Higgins eds. 1984); Animal Cell 
Culture (R.L Freshney ed. 1986); Immobilized Cells and Enzymes (JBL Press, 1986); B. Perbal, A Practical 
Guide to Molecular Cloning (1984); the Methods in Enzymology series (Academic Press, Inc.), especially 

5 volumes 154 & 155; Gene Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, 
Cold Spring Harbor Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and 
Molecular Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and 
Practice^ Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 
/-/F(D.M. Weir and C. C. Blackwell eds 1986). 

1 0 Standard abbreviations for nucleotides and amino acids are used in this specification. 
Further Definitions 

A composition contaimng X is "substantially free of* Y when at least 85% by weight of the total X+Y in 
the composition is X. Preferably, X comprises at least about 90% by weight of the total of X+Y in the 
composition, more preferably at least about 95% or even 99% by weight. 
15 The term "comprising" means "including" as well as "consisting" e,g a composition "comprising" X may 
consist exclusively of X or may include something additional e.g. X + Y. 

The singular forms "a", "and", and "the" include plural referents unless the context clearly dictates 
otherwise. Thus, for example, reference to "a polynucleotide" includes a plurality of such polynucleotides 
and reference to "an epithelial cell" includes reference to one or more cells and equivalents thereof known 

20 to &ose skiUed m the art, etc. 

The term "lieterologous" refers to two biological components that are not foimd togetiier in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the heterologous 
components are not found together in nature, they can function together, as when a promoter heterologous 
to a gene is operably linked to the gene. Anotihier example is where a Streptococcal sequence is heterologous 

25 to a mouse host cell. A further examples would be two epitopes from the same or different proteins which 
have been assembled in a single protem in an arrangement not foxind in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous unit of 
polynucleotide replication within a cell, capable of replication under its own control. An origin of 
30 replication may be needed for a vector to replicate in a particular host cell. With certain origins of 
replication, an e:q)ression vector can be reproduced at a high copy number in the presence of the appropriate 
proteins wifliin the cell. Examples of origins are the autonomously replicating sequences, which are 
effective in yeast; and the viral T-antigen, effective in COS-7 cells. 
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A **mutanf' sequence is defined as DNA, KNA or amino acid sequence dififering firom but having sequence 
identity with the native or disclosed sequence. Depending on the particular sequence, the degree of 
sequence identity between the native or disclosed sequence and the mutant sequmce is preferably greater 
than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated usmg the Smith-Wateman algorithm 
as described above). As used herem, an "allelic variant*' of a nucleic acid molecule, or region, for which 
nucleic acid sequence is provided herein is a nucleic acid molecule, or region, that occurs essentially at the 
same locus in the genome of another or second isolate, and that, due to natural variation caused by, for 
example, mutation or recombination, has a similar but not identical nucleic acid sequence. A coding region 
allelic variant typically encodes a protein having similar activity to that of the protein encoded by the gene 
to which it is being compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated 
regions of the gene, such as in regulatory control regions {eg. see US patent 5,753,235). 
Expression systems 

The Streptococcal nucleotide sequences can be expressed in a variety of different expression systems; for 
example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast, 
i. Mammal ian Svstems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA sequence capable 
of binding mammalian KNA polymerase and initiating the downstream (3*) transcription of a coding 
sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiating region, which is: 
usually placed proximal to the 5' end of the coding sequence, and a TATA box, usually located 25-30 base 
pairs (bp) upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase 
n to begin KNA synthesis at the correct site. A mammalian promoter vnll also contain an upstream 
promoter element, usually located within 100 to 200 bp upstream of tiie TATA box. An upstream promoter 
element determines the rate at which transcription is initiated and can act in either orientation [Sambrook et 
al. (1989) "Expression of Cloned Genes in Manmialian Cells." In Molecular Cloning: A Laboratory 
Manual, 2nd ed J . 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include tiae 
SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter (Ad 
MLP), and herpes simplex virus promoter. In addition, sequences derived jfrom non-viral genes, such as the 
murine metallotheionein gene, also provide usefiil promoter sequences. Expression may be eitiier 
constitutive or regulated (inducible), depending on the promoter can be mduced with glucocorticoid in 
hormone-responsive cells. 

The presence of an enhancer element (enhancer), combined with the promoter elements described above, 
will usually increase e5q>ression levels. An enhancer is a regulatory DNA sequence that can stimulate 
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transcription up to 1000-fold when linked to homologoxis or heterologous promoters, with synthesis 
beginning at the normal RNA start site. Enhancers are also active when they are placed upstream or 
downstream from the transa:iption initiation site, in either normal or flipped orientation, or at a distance of 
more than 1000 nucleotides from the promoter [Maniatis et al, (1987) Science 236:1237; Alberts et al. 
(1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements derived from viruses may be particularly 
useftil, because they usually have a broader host range. Examples include the SV40 early gene enhancer 
[Dijkema et al (1985) EAiBO J. 4:761] and the enhancer/promoters derived from the long terminal repeat 
(LTR) of the Rous Sarcoma Virus [Gorman et al. (1982b) Proc. Natl Acad Sci. 79:6177] and fix)m human 
cytomegalovirus [Boshart et al. (1985) Cell 41:521]. Additionally, some enhancers are regulatable and 
become active only in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and 
Borelli (1986) Trends Genet 2:215; Maniatis et al (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promotfer sequence may be 
directly linked with the DNA molecule, in which case the first ammo acid at the N-terminus of the recom- 
binant protein will always be a methionine, which is encoded by the ATG start codon. If desired, the N- 
terminus may be cleaved fix)m the protein by in v/fro incubation with cyanogen bromide. 
Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric 
DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for 
secretion of the foreign protein in mammalian cells. Preferably, there are processing sites encoded between 
the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids which du^ct the secretion 
of the protem from the cell. The adenovirus triparite leader is an example of a leader sequence that provides 
for secretion of a foreign protein in manmialian cells. 

Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are 
regulatory regions located 3' to the translation stop codon and thus, together with the promoter elements, 
flank the coding sequence. The 3* terminus of the mature mKNA is formed by site-specific post- 
transcriptional cleavage and polyadenylation [Bimstiel et al. (1985) Cell 41:349; Proudfoot and Whitelaw 
(1988) "Termination and 3* end processing of eukaryotic RNA. In Transcription and splicing (ed. B.D. 
Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. Set 14:105]. These sequences direct the 
transcription of an mKNA which can be translated into the polypeptide encoded by the DNA. Examples of 
transcription terminater/polyadenylation signals include those derived from SV40 [Sambrook et al (1989) 
"Expression of cloned genes in cultured mammalian cells." JnMolecular Cloning: A Laboratory Manual]. 
Usually, the above described components, comprising a promoter, polyadenylation signal, and transcription 
termination sequence are put together into expression constructs. Enhancers, introns with fimctional splice 
donor and acceptor sites, and leader sequences may also be included m an expression construct, if desured. 
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Expression constructs are often maintained in a replicon, such as an extrachromosomal element {eg. 
plasmids) capable of stable maintenance in a host, such as mammalian cells or bacteria. Mammalian 
replication systems include those derived fix)m animal viruses, which require trans-acting factors to 
replicate. For example, plasmids containing the replication systems of papovaviruses, such as SV40 

5 [Gluzman (1981) Cell 23:175] or polyomavirus, repUcate to extremely high copy number in the presence of 
the appropriate viral T antigen. Additional examples of mammalian replicons include those derived ftom 
bovine papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a prokaryotic host 
for cloning and amplification. Examples of such mammalian-bacteria shuttle vectors mclude pMH 

10 pCaufinan et al. (1989) Mol. Cell Biol. P:946] and pHEBO [Shraiizu et al. (1986) Mol. Cell Biol 5:1074]. 
The transformation procedure used depends upon the host to be transformed. Methods for introduction of 
heterologous polynucleotides into manmialian cells are known in the art and include dextran-mediated 
transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fiision, 
electroporation, encapsulation of the polynucleotide(s) in Uposomes, and dkect microinjection of the DNA 

IS into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many immortalized 
cell lines available ftom the American Type Culture Collectioii (ATCC), mcluding but not limited to, • 
Chmese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney ceUs 
(COS), human hepatocellular carcinoma cells (eg. Hep G2), and a number of other cell lines. 

20 ii. Baculovirus Svstems 

The polynucleotide encodmg the protein can also be inserted into a suitable insect e5q)ression vector, and is 
operably linked to the control elements within that vector. Vector construction employs techniques which 
are known in the art Generally, the components of the e}q)ression system include a transfer vector, usually a 
bacterial plasmid, which contams both a fragment of the baculovirus genome, and a convenient restriction 

25 site for insertion of the heterologous gene or genes to be expressed; a wUd type baculovirus with a sequence 
homologous to the baculovirus-specific fragment in the transfer vector (this allows for the homologous 
recombination of flie heterologous gene in to tiie baculovirus genome); and appropriate msect host cells and 
growth media. 

After inserting the DNA sequence encoding tiie protein into the transfer vector, the vector and the wild type 
30 viral genome are tiansfected into an insect host cell where the vector and viral genome are allowed to 
recombine. The packaged recombinant virus is ejqjressed and recombinant plaques are identified and 
purified. Materials and methods for baculovirusTmsect cell e3q)ressian systems are commercially available 
in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). These techniques are generaUy 
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known to those skilled in the art and fixlly described in Summers & Smith, Texas Agricultural Experiment 
Station Bulletin No. 1555 (1987) ("Summers & Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above described 
componmts, comprising a promoter, leader (if desked), coding sequence, and transcription termination 

S sequence, are \isually assembled into an intermediate transplacement construct (transfer vector). Tids may 
contain a single gene and operably linked regulatory elements; multiple genes, each with its owned set of 
operably Imked regulatory elements; or multiple genes, regulated by the same set of regulatoiy elements. 
Intermediate transplacement constructs are often maintained ui a replicon, such as an extra-chromosomal 
element {e.g. plasmids) capable of stable maintenance in a host, such as a bacterium. The replicon will have 

10 a replication system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is pAc373. 
Many other vectors, known to those of skill in the art, have also been designed. These include, for example, 
pVL985 (which alters the polyhedrin start codon from ATG to ATT, and which introduces a BamHI 
cloning site 32 basepairs downstream from the ATT; see Luckow and Summers, Virology (1989) 17:31. 

IS The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. Rev. 
Microbiol, y 42:111) and a prokaiyotic ampicillin-resistance {amp) gene and origin of replication for, 
selection and propagation in E. coli. 

Baculovhus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any DNA 
sequence capable of binding a baculovirus RNA polymerase and initiating the downstream (S' to 3') 

20 transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription 
initiation region which is usually placed proximal to the S* end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A 
baculovirus transfer vector may also have a second domain called an enhancer, which, if present, is usually 
distal to the structural gene. Expression may be either regulated or constitutive. 

25 Structural genes, abimdantly transcribed at late times in a viral infection cycle, provide particularly useful 
promoter sequences. Examples include sequences derived from the gene encoding the viral polyhedron 
protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: TTie Molecular Biology 
of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 476; and the gene encoding the 
plO protein, Vlak et al., (1988), J. Gen. Virol. 69:165. 

30 DNA encoding suitable signal sequences can be derived from genes for secreted msect or baculovirus 
protems, such as the baculovfrus polyhedrin gene (Carbonell et al. (1988) Gene, 75:409). Alternatively, 
since the signals for mammalian cell posttranslational modifications (such as signal peptide cleavage, 
proteolytic cleavage, and phosphorylation) appear to be recognized by insect cells, and the signals required 
for secretion and nuclear accumulation also appear to be conserved between the invertebrate cells and 
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vertebrate cells, leaders of non-insect origin, such as those derived from genes encoding human a- 
interferon, Maeda et al., (1985), Nature 315:592; human gastrin-releasing peptide, Lebacq-Verheyden et al., 
(1988), Molec. Cell Biol 5:3129; human IL-2, Smith et al., (1985) Proc. Natl Acad. ScL USA, 52:8404; 
mouse IL-3, (Miyajima et al., (1987) Gene 58:273; and human glucocerebrosidase, Martin et al, (1988) 
DNAy 7:99, can also be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed with the 
proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused foreign proteins 
usually requires heterologous genes that ideally have a short leader sequence containing suitable translation 
initiation signals preceding an ATG start signal. If desired, methiomne at the N-tenninus may be cleaved 
from the mature protein by in vitro incubation with cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from 
the insect cell by creating chimeric DNA molecules that encode a fission protein comprised of a leader 
sequence fragment that provides for secretion of the foreign protein in insects. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the 
translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor of the 
protein, an insect cell host is co-transformed with the heterologous DNA of the transfer vector and the 
genomic DNA of wild type baculovirus - usually by co-transfection. The promoter and transcription 
termination sequence of the construct will usually comprise a 2-5kb section of the bacidovirus genome. 
Methods for introducing heterologous DNA into the desired site m the baculovirus virus are known in the 
art. (See Summers & Smith supra; Ju et al. (1987); Smith et al., Mol Cell Biol (1983) 3:2156; and Luckow 
and Summers (1989))» For example, the insertion can be into a gene such as the polyhedrin gene, by 
homologous double crossover recombination; insertion can also be into a restriction enzyme site engineered 
into the desired baculovirus gene. Miller et al, (1989), Bioessays 4:91.The DNA sequence, when cloned in 
place of the polyhedrin gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific 
sequences and is positioned downstream of the polyhedrin promoter. 

The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant 
baculovirus. Homologous recombination occurs at low frequency (between about 1% and about 5%); thus, 
the majority of the virus produced after cotransfection is still wild-type virus. Therefore, a method is 
necessary to identify recombinant viruses. An advantage of the expression system is a visual screen 
allowing recombinant viruses to be distinguished. The polyhedrin protein, which is produced by the native 
virus, is produced at very high levels in the nuclei of infected cells at late times after viral infectioa 
Accimiulated polyhedrin protein forms occlusion bodies that also contain embedded particles. These 
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occlusion bodies, up to 15 pm in size, are highly refractile, giving them a bright shmy appearance that is 
readily visualized under the light microscope. Cells mfected with recombinant viruses lack occlusion 
bodies. To distinguish recombinant virus from wild-type virus, the transfection supernatant is plaqued onto 
a monolayer of insect cells by techniques known to those skilled in the art Namely, the plaques are 
5 screened under the light microscope for the presence (indicative of wild-type virus) or absence (indicative 
of recombmant virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) 
at 16.8 (Supp. 10, 1990); Summers & Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For 
example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti , Autographa 
10 californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and Trichoplusia ni (WO 
89/046699; Carbonell et al., (1985) 1 Virol 55:153; Wright (1986) Nature 321:71^; Smith et al., (1983) 
Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In Vitro Cell Dev. Biol 25:225). 
Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally known to 
15 those skilled in the art. See, eg. Summers & Smith supra. 

The modified insect cells may then be grown m an appropriate nutrient medium, which allows for stable 
maintenance of the plasmid(s) present in the modified insect host. Where the expression product gene is 
under inducible control, the host may be grown to high density, and expression induced. Alternatively, 
where expression is constitutive, the product will be continuously expressed into the medium and the 
20 nutrient medium must be continuously circulated, while removing the product of interest and augmenting 
depleted nutrients. The product may be purified by such techniques as chromatography, eg. HPLC, afiBnity 
chromatography, ion exchange chromatography, etc.; electrophoresis; density gradient centrifugation; 
solvent extraction, etc. As appropriate, the product may be further purified, as required, so as to remove 
substantially any insect proteins which are also present in the medium, so as to provide a product which is at 
25 least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are incubated 
under conditions which allow expression of the recombinant protein encoding sequence. These conditions 
will vary, dependent upon the host cell selected. However, the conditions are readily ascertainable to those 
of ordinary skill in ttie art, based upon what is known in the art. 
30 iii. Plant Svstems 

There are many plant cell culture and whole plant genetic expression systems known in the art. Exemplary 
plant cellular genetic expression systems include those described in patents, such as: US 5,693,506; US 
5,659,122; and US 5,608,143. Additional examples of genetic expression in plant cell culture has been 
described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions of plant protein signal peptides may 
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be found in addition to the references described above in Vaulcombe et al., Mol Gen, Genet. 209:33-40 
(1987); Chandler et al., Plant Molecular Biology 3:407-418 (1984); Rogers, J. Biol Chem. 260:3731-3738 
(1985); Rothstein et al„ Gene 55:353-356 (1987); Whittier et al.. Nucleic Acids Research 15:2515-2535 
(1987); Wirsel et al.. Molecular Microbiology 3:3-14 (1989); Yu et al.. Gene 122:247-253 (1992). A 
5 description of the regulation of plant gene egression by the phytohormone, gibberellic acid and secreted 
enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellms: in: 
Advanced Plant Physiology,. Malcohn B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21- 
52. References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027-1038(1990); 
Maas et al, EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl Acad Set 84:1337-1339 (1987). 

10 Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
expression cassette comprising genetic regulatory elements designed for operation in plants. The expression 
cassette is inserted into a desired ejqpression vector with companion sequences upstream and downstream 
from the expression cassette suitable for expression in a plant host The companion sequences will be of 
plasmid or viral origin and provide necessary characteristics to the vector to permit the vectors to move 

15 DNA from an original cloning host, such as bacteria, to the desired plant host. The basic bacterial/plant 
vector construct will preferably provide a broad host range prokaryote replication origin; a prokaryote 
selectable marker; and, for Agrobacterium transformations, T DNA sequences for Agrobacterium-mediated 
transfer to plant chromosomes. Where liie heterologous gene is not readily amenable to detection, the 
construct will preferably also have a selectable marker gene suitable for determining if a plant cell has been 

20 transformed. A general review of suitable markers, for example for the members of the grass family, is 
found in Wilmink and Dons, 1993, Plant MoL Biol Reptr, 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome are also 
recommended. These might include transposon sequences and the like for homologous recombination as 
well as Ti sequences which permit random insertion of a heterologous expression cassette into a plant 
25 genome. Suitable prokaryote selectable markers include resistance toward antibiotics such as ampicillin or 
tetracycline. Other DNA sequences encoding additional functions may also be present in the vector, as is 
known in the art. 

The nucleic acid molecules of the subject invention may be included mto an e^qpression cassette for 
e3q)ression of the protein(s) of interest Usually, there will be only one e;q)ression cassette, although two or 
30 more are feasible. The recombinant expression cassette will contain in addition to the heterologous protein 
encoding sequence the following elements, a promoter region, plant 5* imtranslated sequences, initiation 
codon depending upon whether or not the structural gene comes equipped with one, and a transcription and 
translation termination sequence. Unique restriction enzyme sites at the 5' and 3* ends of the cassette allow 
for easy insertion into a pre-existing vector. 
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A heterologous coding sequence may be for any protein relating to the present invention. The sequence 
encoding the protein of interest will encode a signal peptide which allows processing and translocation of 
the protein, as £5)propriate, and will usually lack any sequence which might result in the binding of the 
deshred protein of tiie mvention to a membrane. Smce, for the most part, the transcriptional initiation region 

5 will be for a gene which is e3q)ressed and translocated during germination, by employmg the signal peptide 
which provides for translocation, one may also provide for translocation of the protein of interest In this 
way, tiie protein(s) of interest will be translocated from the cells in which they are expressed and may be 
efficiently harvested. Typically secretion m seeds are across the aleurone or scutellar epitiielium layer into 
the endosperm of the seed. While it is not required tiiat the protein be secreted from the cells in which tiie 

10 protein is produced, this facilitates the isolation and purification of the recombinant protein. 

Since tiie ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable to 
determine whether any portion of the cloned gene contams sequences which will be processed out as introns 
by the hosfs splicosome machinery. If so, site-directed mutagenesis of the "intron" region may be 
conducted to prevent losmg a portion of the genetic message as a false intron code, Reed and Maniatis, Cell 

15 41:95-105, 1985. 

The vector can be microinjected directiy into plant cells by use of micropipettes to mechanically transfer the 
recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985, The genetic material may also be 
transferred mto tiie plant cell by using polyethylene glycol, Krens, et al.. Nature, 296, 72-74, 1982. Another 
metiiod of introduction of nucleic acid segments is high velocity ballistic penetration by small particles with 

20 ttie nucleic acid eitiier witiiin flie matrix of small beads or particles, or on tiie surface, Klein, et al.. Nature, 
327, 70-73, 1987 and Knudsen and Muller, 1991, Planta, 185:330-336 teaching particle bombardment of 
barley endosperm to create transgenic barley. "Yet anotiier method of mtroduction would be fiision of 
protoplasts with otiier entities, either minicells, cells, lysosomes or other ftisible lipid-surfaced bodies, 
Fraley, et al., Proc. Natl Acad Set USA, 79, 1859-1863, 1982. 

25 The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc, Natl Acad. 
Sci. USA 82:5824, 1985). In this technique, plant protoplasts are.electroporated in tiie presence of plasmids 
containing tiie gene construct. Electrical impulses of high field strengtii reversibly permeabilize 
biomembranes allowing the introduction of the plasmids. Electroporated plant protoplasts reform the cell 
wall, divide, and form plant callus. 

30 All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can be 
transformed by the present invention so tiiat whole plants are recovered which contain the transferred gene. 
It is known tiiat practically all plants can be regenerated from cultured cells or tissues, including but not 
limited to all major species of sugarcane, sugar beet, cotton, fruit and ottier trees, legumes and vegetables. 
Some suitable plants include, for example, species from the genera Fragaria, Lotus, Medicago, Onobrychis, 
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Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus, Arabidopsis, Brassica, 
Raphanus, Sinqtis, Atropa, Capsicum, Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, 
Digitalis, Mqj'orana, Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
Nemesia, Pelargonium, Paniam, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, 
5 Glycine, Lolium, Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary fiwm species to species of plants, but generally a suspension of transformed 
protoplasts containing copies of the hetraologous gene is first provided. Callus tissue is formed and shoots 
may be induced from callus and subsequently rooted. Alternatively, embryo formation can be induced fi»m 
the protoplast suspension. These embryos germirate as natural embryos to form plants. The culture media 
10 will generally contain various amino acids and hormones, such as auxin and cytokmms. It is also 
advantageous to add glutamic acid and proline to the medium, especially for such species as com and 
alfalfa. Shoots and roots normally develop simultaneously. Efficient regeneration will depend on the 
medium, on the genotype, and on the history of the culture. If these three variables are controlled, then 
regeneration is fiilly reproducible and repeatable. 
15 In some plant cell culture systems, the desired protein of the invention may be excreted or alternatively, the 
protem may be extracted from tiie whole plant Where the desired protein of the invention is secreted into 
the medium, it may be collected. Alternatively, the embryos and embryoless-half seeds or other plant tissue 
may be mechanically disrupted to release any secreted protein between cells and tissues. The mixture may 
be su^ended in a buffa: solution to retrieve soluble protdns. Conventional protein isolation and 
20 purification mefliods will be then used to purify the recombinant protein. Parameters of time, temperature 
pH, oxygen, and volumes will be adjusted tiirougji routine metiiods to optimize ejqyression and recovery of 
heterologous proteirt 
iv. Bacterial Systems 

Bacterial expression techniques are known in tiie art. A bacterial promoter is any DNA sequence capable of 
25 binding bacterial KNA polymerase and initiating the downstream (3') transcription of a coding sequence 
(eg. stiuctural gene) mto mRNA. A promoter will have a transcription initiation region which is usually 
placed proximal to the 5' end of the coding sequence. This transcription initiation region usually includes an 
RNA polymerase binding site and a transcription initiation site. A bacterial promoter may also have a 
second domain called an operator, that may overlap an adjacent RNA polymerase bindmg site at which 
30 RNA synthesis begins. The operator permits negative regulated (inducible) transcription, as a gene 
repressor protein may bind tiie operator and tiiereby inhibit transcription of a specific gene. Constitutive 
expression may occur in the tibsence of negative regulatory elements, such as tiie operator. In addition, 
positive regulation may be achieved by a gene activator protein binding sequence, which, if present is 
usually proximal (5') to tiie RNA polymerase binding sequence. An example of a gene activator protein is 
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the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli 
(E. coll) [Raibaud et al (1984) Annu. Rev. Genet 18:173], Regulated expression may therefore be either 
positive or negative, thereby either enhancing or reducing transcription. 

Sequences encodmg metabolic pathway engines provide particularly useful promoter sequences. Examples 
5 include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose (lac) 

[Chang et al. (1977) Nature /95:1056], and maltose. Additional examples mclude promoter sequences 

derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al. (1980) Nuc. Acids Res. 5:4057; 

Yelverton et al. (1981) Nucl. Acids Res. P:731; US patent 4,738,921; EP-A-0036776 and EP-A-0121775]. 

The g-laotamase (bla) promoter system [Weissmann (1981) "The cloning of interferon and other mistakes." 
10 In Interferon 3 (ed. I. Gresser)], bacteriophage lambda PL [Shimatake et al (1981) Nature 2P2:128] and T5 

[US patent 4,689,406] promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also fimction as bacterial promoters. For 
example, transcription activation sequences of one bacterial or bacteriophage promoter may be joined with 
the operon sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter 

15 [US patent 4,551,433], For example, the tac promoter is a hybrid trp4ac promoter comprised of both trp 
promoter and lac operon sequences that is regulated by the lac repressor [Amann et al. (1983) Gene 25:167; 
de Boer et al. (1983) Proc. Natl, Acad. Sci, 80:21]. Furthermore, a bacterial promoter can include naturally 
occurring promoters of non-bacterial origin that have the ability to bmd bacterial RNA polymerase and 
initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a 

20 compatible RNA polymerase to produce high levels of e3q)ression of some genes in prokaiyotes. The 
bacteriophage T7 KNA polymerase/promoter system is an example of a coupled promoter system [Studier 
et al (1986) J, Mol Biol. 189:113; Tabor et al. (1985) Proc Natl Acad Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an E, coli operator region (EPO-A-0 267 
851). 

25 In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for the 
expression of foreign genes in prokaryotes. In E. co% the ribosome binding site is called the Shine- 
Dalgamo (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 nucleotides in length 
located 3-11 nucleotides upstream of the initiation codon [Shine et al. (1975) Nature 25^:34], The SD 
sequence is thought to promote bmding of mRNA to the ribosome by the pairing of bases between the SD 

30 sequence and the 3' and of E. coli 16S rKNA [Steitz et al. (1979) "Genetic signals and nucleotide sequences 
in messenger KNA." In Biological Regulation and Development: Gene Expression (ed. R.F. Goldberger)]. 
To express eukaryotic genes and prokaiyotic genes with weak ribosome-binding site [Sambrook et al. 
(1989) "Expression of cloned genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 
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A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked with the 
DNA molecule, in which case the first amino acid at the N-terminus will always be a methionine, which is 
encoded by the ATG start codoa If desired, methionine at the N-tenninus may be cleaved firom the protein 
by in vitro incubation with cyanogen bromide or by either in vivo on in vitro incubation with a bacterial 
methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion protems provide an alternative to direct expression. Usually, a DNA sequence encoding the N- 
tenninal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon e3q)ression, this construct will provide a fusion of the two amino acid 
sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' terminus of a foreign 
gene and expressed in bacteria. The resulting fusion protein preferably retains a site for a processmg 
enzyme (factor Xa) to cleave the bacteriophage protein firom the foreign gene [Nagai et al (1984) Nature 
50P:810]. Fusion proteins can also be made with sequences from the lacZ [Jia et al (1987) Gene 50:197], 
trpE [Allen et al. (1987) I Biotechnol. 5:93; Makoff al (1989) J. Gen. Microbiol 755:11], and Chey 
[EP-A-0 324 647] genes. The DNA sequence at the junction of the two amino acid sequences may or may 
not encode a cleavable site. Another example is a ubiquitin fusion protein. Such a fusion protein is made 
with the ubiquitin region that preferably retains a site for a processing enzyme {eg. ubiquitin specific 
processmg-protease) to cleave the ubiquitin from the foreign protein. Through this method^ native foreign 
protein can be isolated [Miller et al (1989) Bio/Technology 7:698]. 

Altematively„foreign proteins can also be secreted from Ihe cell by creating chhneric DNA molecules that 
encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion of the 
foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes a signal 
peptide comprised of hydrophobic ammo acids which direct the secretion of the protein from the cell. The 
protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, 
located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably there are 
processing sites, which can be cleaved either in vivo or in vitro encoded between the signal peptide 
fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as 
the E, coli outer membrane protein gene {ompA) [Masui et al (1983), in: Experimental Manipulation of 
Gene Expression; Ghrayeb et al (1984) EMBO 1 5:2437] and the E. coli alkaline phosphatase signal 
sequence (phoA) [Oka et al (1985) Proc. Natl Acad. Sci. S2:7212]. As an additional example, the signal 
sequence of the alpha-amylase gene from various Bacillus strains can be used to secrete heterologous 
proteins from B. subtilis [Palva et al (1982) Proc. Natl Acad. ScL USA 7P:5582; EP-A-O 244 042]. 
Usually, transcription termination sequences recognized by bacteria are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
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direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Transcription termination sequences frequently include DNA sequences of about 50 nucleotides capable of 
forming stem loop Structures that aid m terminating transcription. Examples include transcription 
termination sequences derived from genes with strong promoters, such as the trp gene in E. coli as well as 
other biosynthetic genes. 

Usually, tiie above described components, comprising a promoter, signal sequence (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression constructs. 
Expression constructs are often maintained in a replicon, such as an extrachromosomal element {eg. 
plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will have a replication 
system, thus allowing it to be maintained in a prokaryotic host either for expression or for cloning and 
amplification. In addition, a replicon may be either a high or low copy number plasmid. A high copy 
number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 
to about 150. A host containing a high copy number plasmid will preferably contain at least about 10, and 
more preferably at least about 20 plasmids. Either a high or low copy number vector may be selected, 
depending upon the effect of the vector and the foreign protein on the host 

Alternatively, the expression constructs can be integrated into the bacterial genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to the bacterial chromosome 
that allows the vector to integrate. Integrations appear to result from recombinations between homologous 
DNA in the vector and the bacterial chromosome. For example, integratmg vectors constructed with DNA 
from various Bacillus strains integrate into tiie Bacillus chromosome (EP-A- 0 127 328). Integratmg vectors 
may also be comprised of bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of bacterial strains that have been transformed. Selectable markers can be expressed in the 
bacterial host and may include genes which render bacteria resistant to drugs such as ampicillin, 
chloramphenicol, erytiiromycin, kanamycin (neomycin), and tetracycline [Davies et al (1978) Annu. Rev. 
Microbiol 32:469], Selectable markers may also include biosynthetic genes, such as those in the histidme, 
tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation vectors. 
Transformation vectors are usually comprised of a selectable market that is either maintained in a replicon 
or developed into an integrating vector, as described above. 

E3q)ression and transformation vectors, either extra-chromosomal replicons or integrating vectors, have 
been developed for transformation into many bacteria. For example, expression vectors have been 
developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. Natl Acad. Set 
USA 7P:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia coli [Shunatake et al 
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(1981) Nature 292:m; Amann et al (1985) Gene ^0:183; Studier et al (1986) J. Mol Biol. iSP:113; EP- 
A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], Streptococcus cremoris [Powell et al. (1988) Appl 
Environ. Microbiol. 5-^:655]; Streptococcus Uvidans [Powell et al. (1988) Appl. Environ. Microbiol. 
54:655], Streptomyces lividans [US patent 4,745,056]. 
5 Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually include 
eitiier the transformation of bacteria treated with CaClj or other agents, such as divalent cations and DMSO. 
DNA can also be introduced into bacterial cells by electroporation. Transformation procedures usually vary 
with Ihe bacterial species to be transformed. See eg. [Masson et al. (1989) FEMS Microbiol Lett. 60:213; 
Palva et al. (1982) Proc Natl. Acad. Sci. USA 7P:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 
10 84/04541, BacUlus], [Miller et al. (1988) Proc. Natl. Acad. Sci. 55:856; Wang et al. (1990) J. Bacteriol. 
772:949, Campylobacter], [Cohen et al. (1973) Proc. Natl. Acad. Sci. 69:2110; Dower et al. (1988) Nucleic 
Acids Res. 16:6121; Kushner (1978) "An improved method for transformation of Escherichia coli with 
CoIEl-derived plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al. (1970) J. Mol. Biol. 53:159; Taketo (1988) 
15 Biochim. Biophys. Acta P4P:318; Escherichia], [Chassy et al (1987) FEMS Microbiol Lett. 44:113 
Lactobacillus]; [Fiedler et al (1988) ^noZ. Biochem J 70:38, Pseudomonas]; [Augustin et al (1990) FEMS 
Microbiol Lett. 66:203, Staphylococcus], [Barany et al (1980) J. Bacteriol 144:69%; Harlander (1987) 
"Transfonnation of Streptococcus lactis by electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. 
Curtiss m); Perry et al (1981) Infect. Immun. 32:1295; PoweU et al (1988) Appl Environ. Microbiol 
20 5^^:655; Somkuti et al (1987) Proc. 4th Evr. Cong. Biotechnology 1 :412, Streptococcus]. 
V. Yeast Expression 

Yeast expression systems are also known to one of ordinary skill m the art. A yeast promoter is any DNA 
sequence capable of binding yeast RNA polymerase and initiating tiie downstream (3') transcription of a 
coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription mitiation region 

25 which is usually placed proximal to the 5' end of the coding sequence. This transcription initiation region 
usually includes an RNA polymerase binding site (the "TATA Box") and a transcription initiation site. A 
yeast promoter may also have a second domain called an upstream activator sequence (UAS), which, if 
present, is usually distal to the structural gene. The UAS permits regulated (inducible) expression. Constitu- 
tive expression occurs in tiie absence of a UAS. Regulated exi»ession may be either positive or negative, 

30 thereby either enhancing or reducing transcriptiorL 

Yeast is a fermaitmg organism with an active metabolic pathway, therefore sequences encoding enzymes in 
the metabolic pa&way provide particularly useful promoter sequences. Examples include alcohol 
dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokiMse, gJucose-6-phosphate isomerase, 
glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, phosphofructokinase, 3- 
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phosphogiycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast PH05 gene, encoding 
acid phosphatase, also provides useful promoter sequences [Myanohara et al. (1983) Proc. Natl. Acad. Set 
USA 80:1]. 

In addition, syntiietic promoters which do not occur in nature also function as yeast promoters. For 
etample, UAS sequences of one yeast promoter may be joined with the transcription activation region of 
anotiier yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid promoters include 
fbe ADH regulatory sequence linked to the GAP transcription activation region (US Patent Nos. 4,876,197 
and 4,880,734). Ottier examples of hybrid promotes include promoters which consist of tiie regulatory 
sequences of either the ADH2, GAL4, GALIO, OR PH05 genes, combined witii the transcriptional 
activation region of a glycolytic enzyme gene such as GAP or PyK (EP-A-0 164 556). Furthermore, a yeast 
promoter can include naturally occurring promoters of non-yeast origin that have tiie ability to bind yeast 
RNA polymerase and initiate transcription. Examples of such promoters mclude, inter alia, [Cohen et al. 
(1980) Proc. Natl. Acad. Sci. USA 77:1078; Henikoff al. (1981) Nature 283:835; HoUenberg et al. (1981) 
Curr. Topics Microbiol. Immunol. 96M9; HoUenberg et al. (1979) "The Expression of Bacterial Antibiotic 
Resistance Genes in tiie Yeast Saccharomyces cerevisiae," in: Plamids of Medical, Environmental and 
Commercial Importance (eds. K.N. Tinunis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 77:163; 
Panlhier et al. (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be raqpressed intiticellularly in yeast A promoter sequence may be dhectiy linked 
with the DNA molecule, in which case tiie first amino acid at tiie N-terminus of tiie recombinant protein 
will always be a metiiionine, which is encoded by the ATG start codon. If desired, metiiionine at flie N- 
tenninus may be cleaved firom the protein by in vitro incubation with cyanogen bromide. 
Fusion proteins provide an alternative for yeast e:q)ression systems, as well as m mammalian, baculovinis, 
and bacterial expression systems. Usually, a DNA sequence encoding tiie N-terminal portion of an 
endogenous yeast protein, or otiier stable protein, is fused to tiie 5* end of heterologous coding sequences. 
Upon expression, tiiis construct will provide a fusion of tiie two amino acid sequences. For example, tiie 
yeast or human superoxide dismutase (SOD) gene, can be linked at tiie 5' terminus of a foreign gene and 
expressed m yeast. The DNA sequence at tiie junction of the two amino acid sequences may or may not 
encode a cleavable site. See eg. EP-A-0 196 056. Anotiier example is a ubiquitin fusion protem. Such a 
fusion protein is made witii tiie ubiquitin region that preferably retams a site for a processing enzyme {eg. 
ubiquitin-specific processmg protease) to cleave tiie ubiquitin firom tiie foreign protein. Through tiiis 
mefliod, therefore, native foreign protein can be isolated {fig. WO88/024066). 

Alternatively, foreign proteins can also be secreted firom tiie cell mto tiie growtii media by creating chimeric 
DNA molecules tiiat encode a fiision protein comprised of a leader sequence firagment tiiat provide for 
secretion in yeast of tiie foreign protein. Preferably, tiiere are processing sites encoded between tiie leader 
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fiagmeat and the foreign gene 4at can be cleaved either in vivo or in vitro. The leader sequence fragment 
usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the 
protem from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the 
5 yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-fector gene (US patent 4,588,684). 
Alternatively, leaders of non-yeast origin, such as an interferon leader, exist fliat also provide for searetion 
in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which 
contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor fragments that can be 
10 employed include the full-length pre-pro alpha factor leader (about 83 amino acid residues) as well as 
truncated alpha-factor leaders (usually about 25 to about 50 amino acid residues) (US Patents 4,546,083 and 
4,870,008; EP-A-0 324 274). Additional leaders employing an alpha-factor leader fragment that provides 
for secretion include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region 
from a second yeast alphafector. (eg. see WO 89/02463.) 
15 Usually, transcription terminadon sequences recognized by yeast are regulatory regions located 3' to the 
translation stop codon, and thus together with the promoter flank the coding sequence. These sequences 
direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. 
Examples of transcription terminator sequence and other yeast-recognized termination sequences, such as 
those coding for glycolytic enzymes. 
20 Usually, the above described components, comprising a promoter, leader (if desired), codmg sequence of 
interest, and transcription termination sequence, are put togetiier into expression constructs. Expression 
constructs are often maint^ned in a replicon, such as an extrachromosomal elanent (eg. plasmids) capable 
of stable maintenance in a host, such as yeast or bacteria. The replicon may have two repUcation systems, 
thus allowing it to be maintained, for example, in yeast for expression and in a prokaryotic host for cloning 
25 and ampUfication. Examples of such yeast-bacteria shuttle vectors include YEp24 [Botstein et al. (1979) 
Gene 8:17-24], pCl/1 [Brake et al. (1984) Proc. Natl. Acad. Sci USA 8i:4642-4646], and YRpl7 
[Stinchcomb et al. (1982) J. Mol Biol. 158:151]. In addition, a repUcon may be either a high or low copy 
number plasmid. A high copy number plasmid will generally have a copy number ranging from about 5 to 
about 200, and usually about 10 to about 150. A host containing a high copy number plasmid will 
30 preferably have at least about 10, and more preferably at least about 20. Enter a high or low copy number 
vector may be selected, depending upon tiie effect of the vector and tiie foreign protein on the host See eg. 
Brake et al, supra. 

Alternatively, the expression constructs can be integrated into the yeast genome wilh an integrating vector. 
Integrating vectors usually contain at least one sequence homologous to a yeast chromosome that allows the 
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vector to integrate, and preferably contain two homologous sequences flanking the expression construct. 
Integrations app^ to result firom recombinations between homologous DNA in the vector and the yeast 
chromosome [Orr-Weaver et al (1983) Methods in Enzymol i0/:228-245]. An integrating vector may be 
directed to a specific locus m yeast by selecting the appropriate homologous sequence for inclusion in the 
vector. See Orr-Weaver et al, supra. One or more ejq>ression construct may mtegrate, possibly affecting 
levels of recombinant protein produced [Rine et al (1983) Proc. Natl Acad. Sci. USA 50:6750]. The 
chromosomal sequences included in the vector can occur eitiiw as a sir^e segment in the vector, which 
results in the integration of the entire vector, or two segments homologous to adjacent segments m the 
chromosome and flanking the expression construct in the vector, which can result m the stable mtegration 
of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow 
for the selection of yeast strains that have been transformed. Selectable markers may include biosynthetic 
genes that can be expressed in the yeast host, such as ADE2, HIS4, LEW, TRPl, and ALG7, and the G418 
resistance gene, which confer resistance in yeast cells to tunicamycin and G418, respectively. In addition, a 
suitable selectable marker may also provide yeast with the ability to grow in the presence of toxic 
compounds, such as metal. For example, the presence of CUPl allows yeast to grow in the presence of 
copper ions |Butt et al (1987) Microbiol, Rev. 5i :351]. 

Alternatively, some of the above described components can be put together into tiransformatitm vectors.. 
Transfomiation vectors are usually comprised of a selectable marker that is either maintained in a replicon 
or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or mtegrating vectors, have been 
developed for transformation into many yeasts. For example, e)q)ression vectors have been developed for, 
inter alia, the following yeasts :Candida albicans [Kurtz, et al (1986) Mol Cell Biol 6:142], Candida 
maltosa [Kunze, et al (1985) J. Basic Microbiol 25:141]. Hansenula polymorpha [Gleeson, et al (1986) J. 
Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol Gen. Genet. 202:302], Kluyveromyces fragilis 
[Das, et al (1984) J. Bacterial i5S:1165], Kluyveromyces lactis pe Louvencourt et al (1983) J. 
Bacterial 154:731; Van den Berg et al (1990) Bio/Technology 5:135], Pichia guillerimondii [Kunze et al 
(1985) J. Basic Microbiol 25:141], Pichia pastoris [Ciegg, et al (1985) Mol Cell Biol 5:3376; US Patent 
Nos. 4,837,148 and 4,929,555], Saccharomyces cerevisiae [Hmnen et al (1978) Proc. Natl Acad. Set USA 
75:1929; Ito et al. (1983) J. Bacterial i53:163], Schizosaccharomyces pombe (Beach and Nurse (1981) 
Nature 300:706], and Yarrowia Upolytica [Davidow, et al (1985) Curr. Genet. 70:380471 Gaillardin, et al 
(1985) Curr. Genet. 10:49]. 

Methods of introducmg exogenous DNA into yeast hosts are well-known in the art, and usually mclude 
either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. Transformation 
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procedures usually vary with the yeast species to be transformed See eg. [Kurtz et al (1986) Mol Cell 
Biol 6:142; Kunze et al (1985) J, Basic Microbiol 25:141; Candida]; [Gleeson et al (1986) J. Gen. 
Microbiol 132:3459; Roggenkamp et al (1986) Mol Gen. Genet. 202:302; Hansenula]; [Das et al (IM) 
J. Bacteriol i58:1165; De Louvencourt et al (1983) J. Bacteriol 154:1165; Van den Berg et al (1990) 
5 Bio/Technology 8:135; Kluyveromyces]; [Gregg et al (1985) Mol Cell Biol J:3376; Kunze et al (1985) J. 
Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia]; [Humen et al (1978) Proc. 
Natl Acad. Set USA 75;1929; Ito et al (1983) /. Bacteriol 153:163 Saccharomyces]; [Beach and Nurse 
{19%\) Nature 300:706; Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet 10:39; Gaillardin et al 
(1985) Curr. Genet 10:49; Yarrowia]. 
10 Antibodies 

As used herein, the term "antibody'' refers to a polypeptide or group of polypeptides composed of at least 
one antibody combining site. An "antibody combining site" is the three-dimensional binding space with an 
internal surface shape and charge distribution complementary to the features of an epitope of an antigen, 
which allows a binding of the antibody with the antigen. "Antibody" includes, for example, vertebrate 
15 antibodies, hybrid antibodies, chimeric antibodies, humanised antibodies, altered antibodies, univalent 
antibodies. Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are usefiil for afifinity chromatography, immunoassays, and 
distinguishing^dentifying Streptococcal proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
20 conventional methods. In general, the protein is first used to immunize a suitable animal, preferably a 
mouse, rat, rabbit or goat Rabbits and goats are preferred for the preparation of polyclonal sera due to the 
volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat antibodies. 
Immunization is generally performed by mixing or emulsifying the protein in saline, preferably in an 
adjuvant such as Freund's complete adjuvant, and injecting the mixture or emulsion parenterally (generally 
25 subcutaneously or intramuscularly). A dose of 50-200 |xg/mjection is typically sufficient. Immunization is 
generally boosted 2-6 weeks later with one or more injections of the protein in saline, preferably xising 
Freund's incomplete adjuvant. One may alternatively generate antibodies by in vitro immunization using 
methods known in the art, which for the purposes of this invention is considered equivalent to in vivo 
immunization. Polyclonal antisera is obtained by bleeding the hnmunized animal into a glass or plastic 
30 container, incubating the blood at 25''C for one hour, followed by incubating at 4**C for 2-18 hours. The 
serum is recovered by centrifiigation (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be 
obtained bom rabbits. 



62 



wo 2004/018646 



PCT/US2003/026827 



Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature (1975) 
256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described above. 
However, rather than bleedmg the animal to extract serum, the spleen (and optionally several large lymph 
nodes) is removed and dissociated into single cells. If desired, tiie spleen cells may be screened (after 
5 removal of nonspecifically adherent cells) by applying a cell suspension to a plate or well coated with the 
protein antigen. B-cells expressing membrane-bound immunoglobulin specific for the antigen buid to the 
plate, and are not rinsed away with the rest of the suspension. Resulting B-cells, or all dissociated spleen 
cells, are then induced to fuse witii myeloma cells to form hybridomas, and are cultured in a selective 
medium {eg. hypoxanthine, aminopterin, thymidine medium, "HAP'). The resulting hybridomas are plated 
10 by limiting dilution, and are assayed for production of antibodies which bind specifically to the immunizing 
antigen (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 
cultured either in vitro (eg. in tissue culture botties or hollow fiber reactors), or in vivo (as ascites in mice). 
If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chrqmophores, radioactive atoms (particularly ^^P and 
15 electron-dense reagents, etaymes, and ligands having specific binding partners. Enzymes are typically 

detected by their activity. For example, horseradish peroxidase is usually detected by its ability to convert 
3,3',5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable witii a spectrophotometer. "Specific 
binding partner" refers to a protdn capable of binding a ligand molecule with high specificity, as for 
example m the case of an antigen and a monoclonal antibody specific therefor. Other specific binding 
20 partners mclude biotin and avidin or stireptavidm, IgG and protein A, and tiie numerous receptor-Ugand 
couples known in the art. It should be understood that tiie above description is not meant to categorize the 
various labels into distinct classes, as the same label may serve in several different modes. For example, 
may serve as a radioactive label or as an electron-dense reagent. HRP may serve as en2yme or as anti^ for 
a MAb. Further, one may combine various labels for desired effect. For example, MAbs and avidin also 
25 require labels in tiie practice of tiiis invention: thus, one might label a MAb witii biotin, and detect its 
presence witii avidin labeled witii or witii an anti-biotin MAb labeled witii HRP. Otiier permutations 
and possibilities will be readily apparent to tiiose of ordinary skill in ttie art, and are considered as 
equivalents within the scope of the instant inventioiL 
Pharmaceutical Compositions 
30 Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of tiie invention 
The pharmaceutical compositions will comprise a ther^eutically effective amount of eitiier polypeptides, 
antibodies, or polynucleotides of tiie claimed invention. 

The term ••tiieKq)eutically effective amount" as used herein refers to an amount of a tiierapeutic agent to 
treat, ameliorate, or prevent a desked disease or condition, or to exhibit a detectable tiierapeutic or 
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preventative effect. The effect can be detected by, for example, chemical markers or antigen levels. 
Therapeutic effects also include reduction in physical symptoms, such as decreased body temperature. The 
precise effective amount for a subject will depend upon the subject's size and health, the nature and extent 
of the condition, and the therapeutics or combination of therapeutics selected for administration. Thus, it is 
not useful to specify an exact effective amount in advance. However, the effective amount for a given 
situation can be determined by routine experimentation and is within the judgement of the clinician. 
For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg or 0.05 
mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is administered. 
A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier^* refers to a Carrie for administration of a therapeutic agent, such as 
antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any pharmaceutical 
carrier that does not itself induce the production of antibodies harmful to the individual receiving the 
composition, and which may be administered without undue toxicity. Suitable carriers may be large, slowly 
metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, 
polymeric amino acids, amino acid copolymers, and inactive virus particles. Such carriers are well known to 
those of ordinary skill in the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids such as 
acetates, propionates, malonates, benzoates, and the like. A thorough discussion of pharmaceutically 
acceptable excipients is available in Remingtorfs Pharmaceutical Sciences (Mack Pub. Co., N.J. 1991). 
Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, saline, 
glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. Typically, the therapeutic compositions 
are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or 
suspension in, liquid vehicles prior to injection may also be prepared. Liposomes are included within the 
definition of a pharmaceutically acceptable carrier. 
Delivery Methods 

Once formulated, the compositions of the invention can be administered dhrectly to the subject. The subjects 
to be treated can be animals; in particular, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either subcutaneously, 
hxtraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesion. Other modes of administration include oral and 
pulmonary administration, suppositories, and transdermal or transcutaneous applications (eg. see 
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WO98^0734), needles, and gene guns or hypoq)rays. Dosage treatment may be a single dose schedule or a 
multiple dose schedule. 

See also Delivery Strategies for Antisense Oligonucleotide Therapeutics (ed. Akhtar) ISBN 0849347785. 
Vaccines 

5 Vacdnes according to the invention may eitho- be prophylactic (ie. to prevent infection) or tiier^utic (/e. 
to treat disease aft^ infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), pn>tein(s) or nucleic acid, 
usually in combination witti •'pharmaceutically acceptable carriers," which include any carriw that does not 
itself induce the production of antibodies harmful to tiie individual receiving the composition. Suitable 

10 carriers are typically large, slowly metabolized macromolecuies such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, lipid aggregates (such 
as oil droplets or liposomes), and inactive virus particles. Such carriers are well known to tiiose of ordinary 
skill in the art Additionally, these carriers may fimction as immunostimulating agents ("adjuvants"). 
Furttiermore, the antigen or immunogen may be conjugated to a bacterial toxoid, such as a toxoid from 

15 diphtheria, tetanus, cholera, K pylori, etc. patiiogens. 

Vaccines of the invention may be administered in conjunction with other immunoregulatory 
agents. In particular, compositions will usually include an adjuvant. 

Preferred further adjuvants include, but are not limited to, one or more of tiie followmg set forth 
below: 

20 A. Mineral Cn ntainin p Compositions 

Mineral containing compositions suitable for use as adjuvants in the invention include mineral 
salts, such as aluminium salts and calcium salts. The invention includes mineral salts such as 
hydroxides (e.^. oxyhydroxides), phosphates (eg. hydroxyphoshpates, orthophosphates), 
Sulphates, etc. {e.g. see ch^ters 8 & 9 of ref. 1}), or mixtures of different mineral compounds, 
25 with the compounds taking any suitable form (e.g. gel, crystailine, amorphous, etc.), and with 
adsorption being preferred. The mineral containing compositions may also be formulated as a 
particle of metal salt See ref. 2. 

B. Oil-Emulsions 

Oil-CTiulsion compositions suitable for use as adjuvants in the invention include squalene-wato: 
30 emulsions, such as MF59 (5% Squalene, 0.5% Tween 80, and 0.5% Span 85, formulated into 
submicron particles using a microfluidizo:). See ref. 3 . 

Complete Freund's adjuvant (CFA) and incomplete Freund's adjuvant (IF A) may also be used as 
adjuvants in the invention. 
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C. Rq pnnin Formulations 

S£5)omii. fonnulations, may also be med as adjuvants in the invention. Saponins are a 
heterologous group of sterol glycosides and triterpeaaoid glycosides that are found in the bark, 
leaves, stems, roots and even flowers of a wide range of plant species. Saponin from the bark of 
the Quillaia saponaria Molina tree have beai widely studied as adjuvants. S^onin can also be 
commercially obtained from Smilax omata (sarsajailla), Gypsophilla paniculata (brides veil), 
and Saponaria offlcianalis (soap root). Saponin adjuvant formulations include purified 
formulations, sudi as QS21, as well as lipid formulations, such as ISCOMs. 

S^onin compositions have been purified using High Performance Thin Layer Chromatography 
(HP-LC) and Reversed Phase High Performance Liquid Ouromatography (RP-HPLC). Specific 
purified fiactions using these techniques have been identified, including QS7, QS17, QS18, 
QS21, QH-A, QH-B and QH-C. Preferably, the s^onin is QS21. A method of production of 
QS21 is disclosed in U.S. Patent No. 5,057,540. Saponin formulations may also comprise a 
sterol, such as cholesterol (see WO 96/33739). 

Combinations of saponins and cholesterols can be used to form unique particles called 
Lnmunostimulating Complexs (ISCOMs). ISCOMs typically also include a phosphoUpid such 
as phosphatidylethanolamine or phosphatidylcholine. Any known saponin can be used in 
ISCOMs. Preferably, the ISCOM includes one or more of Quil A, QHA and QHC. ISCOMs are 
further described in EP 0 109 942, WO 96/11711 and WO 96/33739. Optionally, the ISCOMS 
may be devoid of additional detergent. See ref. 4. 

A review of the development of saponin based adjuvants can be found at ref. 5. 

C. Virosomes and Virus Like Particles (VLPs't 

Vkosomes and Vmis Like Particles (VLPs) can also be used as adjuvants in the invention. 
These structures generally contain one or more laroteins from a virus optionally combined or 
formulated with a phospholipid. They are generally non-pathogenic, nbn-replicating and 
^erally do not contain any of the native viral genome. The vural proteins may be recombinantiy 
produced or isolated fix)m whole viruses. These viral proteins suitable for use in vurosomes or 
VLPs include proteins derived from influenza virus (such as HA or NA), Hepatitis B virus (such 
as core or cs^sid proteins). Hepatitis E virus, measles virus, Sindbis virus. Rotavirus, Foot-and- 
Mouth Disease vuus, Retrovirus, Norwalk virus, human Pjq)illoma virus, HIV, RNA-phages, 
QB-phage (such as coat proteins), GA-phage, fr-phage, AP205 phage, and Ty (such as 
retrotransposon Ty protein pi). VLPs are discussed ftirther in WO 03/024480, WO 03/024481, 
and Refe. 6, 7, 8 and 9. Virosomes are discussed ftirther in, for example, Ref. 10 

D. BactCTial or Microbial Derivatives 
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Adjuvants suitable for use in the invention include bacterial or microbial derivatives such as: 
(1) Non-toxic derivatives of enterobacterial Upopofysaccharide (LPS) 

Such derivatives include Monophosphoryl Upid A (MPL) and 3-O-deacylated MPL (3dMPL). 

3dMPL is a mixture of 3 De-O-acylated monophosphoryl lipid A with 4, 5 or 6 acylated chains. 
5 A preferred "small particle" fonn of 3 De-O-acylated monophosphoryl lipid A is disclosed in EP 

0 689 454. Such "small particles" of 3dMPL are small enough to be sterile filtered through a 

0.22 micron membrane (see EP 0 689 454). Other non-toxic LPS derivatives include 

monophosphoryl lipid A mimics, such as aminoalkyl glucosaminide phosphate derivatives e.g. 

RC-529. SeeRef. 11. 
10 (2) Lipid A Derivatives 

Lipid A derivatives include derivatives of lipid A fix>m Escherichia coli such as OM-174. OM- 

174 is described for example in Ref. 12 and 13. 

(3) Irmnunostimulatory oligonucleotides 

Immunostimulatory oligonucleotides suitable for use as adjuvants in the invention include 
15 nucleotide sequences containing a CpG motif (a sequence containing an unmethylated cytosine 
followed by guanosine and linked by a phosphate bond). Bacterial double stranded RNA or 
oligonucleotides containing palindromic or poly(dG) sequences have also been shown to be 
immvinostimulatory. 

The CpG's can include nucleotide modifications/analogs such as phosphorothioate modifications 
20 and can be double-stranded or single-stranded. Optionally, the guanosine may be replaced with 
an analog such as 2'-deoxy-7-deazaguanosine. See ref. 14, WO 02/26757 and WO 99/62923 for 
examples of possible analog substitutions. The adjuvant effect of CpG oligonucleotides is further 
discussed in Refe. 15, 16, WO 98/40100, U.S. Patent No. 6,207,646, U.S. Patent No. 6,239,116, 
and U.S. Patent No. 6,429,199. 

25 The CpG sequence may be directed to TLR9, such as the motif GTCGTT or TTCGTT. See ref 
17. The CpG sequence may be specific for inducing a Thl immune response, such as a CpG-A 
ODN, or it may be more specific for inducing a B cell response, such a CpG-B ODN. CpG-A 
and CpG-B ODNs are discussed in refs. 1 8, 19 and WO 01/95935. Preferably, the CpG is a CpG- 
AODN. 

30 Preferably, the CpG oligonucleotide is constructed so that the 5' end is accessible for receptor 
recognition. Optionally, two CpG oligonucleotide sequences may be attached at their 3' ends to 
form "immunomers". See, for example, refe. 20, 21, 22 and WO 03/035836. 

(4) ADP-ribosylating toxins and detoxified derivatives thereof. 
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BactCTial ADP-ribosylating toxins and detoxified derivatives thereof may be \ised as adjuvants in 
the invention. Preferably, the protein is derived from E. coli (i.e., E. coU heat labile enterotoxin 
«LT), cholera ("CT"), or pertussis ("PT"). The use of detoxified ADP-ribosylating toxins as 
mucosal adjuvants is described in WO 95/17211 and as parenteral adjuvants in WO 98/42375. 

5 The toxin or toxoid is preferably in the form of a holotoxin, comprising both A and B subunits. 
Preferably, the A subunit contains a detoxifying mutation; preferably the B subunit is not 
mutated. Preferably, the adjuvant is a detoxified LT mutant such as LT-K63, LT-R72, and 
LTR192G. The use of ADP-ribosylating toxins and detoxified derivaties thereof, particularly 
LT-K63 and LT-R72, as adjuvants can be found in Refs. 23, 24, 25, 26, 27, 28, 29 and 30 each 

10 of which is specifically incorporated by reference herein in their entirety. Numerical reference 
for amino acid substitutions is preferably based on the alignments of the A and B subunits of 
ADP-ribosylating toxins set forth in Domenighini et al., Mol. Microbiol (1995) 11(6):1165 - 
1 167, specifically incorporated herein by reference in its entirety. 

E. Hnman TmTnunomodulators 

15 Human immunomodulators suitable for use as adjuvants in the invention include cytokines, such 
as interleukins (e.g. IL-\, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (e.g. interferon-?), 
macrophage colony stimulating factor, and tumor necrosis fector. 

F. Bioadhesives and Mucoadhesives 

Bioadhesives and mucoadhesives may also be used as adjuvants in the invention. Suitable 
20 bioadhesives include esterified hyaluronic acid microspheres (Ref 3 1) or mucoadhesives such as 
cross-linked derivatives of poly(acrylic acid), polyvinyl alcohol, polyvinyl pyrollidone, 
polysaccharides and carboxymethylcellulose. CSiitosan and derivatives thereof may also be used 
as adjuvants in the invention. E.g., ref. 32. 

G. Micapparticles 

25 Microparticles may also be used as adjuvants in the invention. Microparticles (i.e. a particle of 
~100nm to ~150pm in diameter, more preferably ~200nm to ~30pm in diameter, and most 
preferably ~500nm to ~10}im in diameter) formed fixjm materials that are biodegradable and 
non-toxic (e.g. a poly(a-hydroxy add), a polyhydroxybutyric add, a polyorthoester, a 
polyanhydride, a polycaprolactone, etc.\ with polyOactide-co-glycolide) are preferred, 

30 optionally treated to have a negatively-charged surface ie.g. with SDS) or a positively-charged 
sur&ce {e.g. with a cationic detergent, such as CTAB). 

H. Liposomes 

Examples of Uposome formulations suitable for use as adjuvants are described in U.S. Patent No. 
6,090,406, U.S. Patent No. 5,916,588, and EP 0 626 169. 
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1. Polvoxvethvleae etfaer and Polvoxvethvlene F -ster Fonnulations 

Adjuvants suitable for use in the invention include polyoxyethylene efliers and polyoxyefhylene 
esters. Ref. 33. Such formulations further include polyoxyethylene soibitan ester surfactants in 
combination with an octoxynol (Ret 34) as well as polyoxyethylene alkyl ethers or ester 
5 surfectants in combination with at least one additional non-ionic surfectant such as an octoxynol 
(Ref. 35). 

Preferred polyoxyethylene ethers are selected from the following group: polyoxyethylene-9- 
lauryl ether (laureth 9), polyoxyethylene-9-steoryl ether, polyoxytheylene-8-steoryl ether, 
polyoxyethylene-4-lauryl ether, polyoxyethyleiie-35-lauryl ether, and polyoxyethylene-23-lauryl 
10 ether. 

J. Polvphosphazene ( PCPP') 

PCPP formulations are described, for example, in Ref. 36 and 37. 
K. Muramvl peptides 

Examples of muramyl peptides suitable for use as adjuvants in tiie invention include N-acetyl- 
15 muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-noimuramyl-L-alanyl-D-isoglutamine 
(nor-MDP), and N-acetyhnuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(l '-2'-dipaknit03d-5/i- 
glycero-3-hydroxyphosphoryloxy)-ethylaniineMTP-PE). 

L. TTniHn-7.nqmnolone Compounds . 

Examples of imidazoquinolone compomds suitable for use adjuvants in the invention include 
20 hniquamod and its homologues, described further in Ref. 38 and 39. 

The invention may also comprise combinations of aspects of one or more of the adjuvants 
identified above. For example, the following adjuvant impositions may be used in the 
invention: 



94/00153); 

(3) a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) + a 
cholesterol; 

(4) a saponin ie.g. QS21) + 3dMPL + IL-12 (optionally + a sterol) (Ref 41); 
30 combinations of 3dMPL with, for example, QS21 and/or oil-in-water emulsions (Ref 42); 



25 



(1) 
(2) 



a saponin and an oil-in-water emulsion (ref 40); 

a saponin (e.g.., QS21) + a non-toxic LPS derivative (e.g., 3dMPL) (see WO 
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(5) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-block polymer 
L121, and thr-MDP, either micorofluidized into a submicron emulsion or vortexed to generate a 
larger particle size emulsion. 

(6) Ribi™ adjuvant system (RAS), (Ribi Immtinochem) containing 2% Squalene, 
0.2% Tween 80, and one or more bactmal cell wall components from the gmvp consisting of 
monophosphoryUpid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS), 
preferably MPL + CWS (Detox™); and 

(7) one of more inineral salts (such as an aluminum salt) + a non-toxic derivative of 
LPS (such as 3dPML). 

Aluminium salts and MF59 are preferred adjuvants for parenteral immunisation. Mutant bacterial 
toxins are preferred mucosal adjuvants. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ nucleic acid, 
pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such as water, saline, 
glycerol, elhanol, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH 
buffering substances, and the like, may be present in such vehicles. 

Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; soUd forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also 
be prepared. The preparation also may be emulsified or encapsulated in liposomes for enhanced adjuvant 
effect, as discussed above under pharmaceutically acceptable carri^. 

Immunogenic compositions used as vaccines comprise an Ltnmunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned conq)onents, as 
needed. By "immunologically effective amount", it is meant tiiat the administration of that amount to an 
individual, eitiier in a single dose or as part of a series, is effective for treatment or prevention. This amount 
varies depending upon tiie health and physical condition of the individual to be treated, the taxonomic grovq) 
. of individual to be treated (eg. nonhuman primate, primate, etc.), the capacity of the individual's immune 
system to syntiiesize antibodies, tiie degree of protection desired, tiie formulation of the vaccine, the treating 
doctor's assessment of the medical situation, and other relevant factors. It is expected tiiat the amount will 
Ml in a relatively broad range that can be determined tiuough routine trials. 

The unmunogenic compositions are conventionally administered parenterally, eg. by injection, either subcu- 
taneously, intramuscularly, or transderaially/transcutaneously (eg WO98/20734). Additional formulations 
suitable for otiier modes of administration include oral and puhnonary formulations, suppositories, and 
transdermal applications. Dosage treatment may be a single dose schedule or a multiple dose schedule. The 
vaccine may be administered in conjunction with other immunoregulatory agents. 
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As an alternative to protein-based vaccines, DNA vaccination may be used [eg. Robinson & Torres (1997) 
Seminars in Immunol 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 15:617-648; later herein]. 
Gene Deliverv Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a theKq)eutic of the 
5 invention, to be delivered to flie mammal for expression in the mammal, can be administered either locally 
or systemically. Hiese constructs can utilize viral or non-viral vector approaches in in vivo or ex vivo 
modality. Expression of such codmg sequence can be induced using endogenous mammalian or 
heterologous promoters. Expression of the coding sequence in vivo can be either constitutive or regulated. 
The invention includes gene delivery vehicles capable of expressing the contanplated nucleic acid 
10 sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can also be an 
astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, picomavirus, poxvirus, 
or togavirus viral vector. See generally. Jolly (1994) Cancer Gene Therapy 1:51-64; Kimura (1994) Human 
Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 6:185-193; and KapUtt (1994) Nature 
15 Geneft'cs 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy vector is 
employable in the mvention, including B, C and D type retroviruses, xenotropic retroviruses (for example, 
NZB-Xl, NZB-X2 and NZB9-1 (see OTsTeiU (1985) J. Virol 53:160) polytropic retroviruses eg. MCF and 
MCF-MLV (see Kelly (1983) J. Virol. 45:291), spumaviruses and lentiviruses. See RNA Tumor Viruses, 
20 Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For example, 
retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site from a Rous Sarcoma 
Virus, a packaging signal from a Murine Leukemia Virus, and an origin of second strand synthesis from an 
Avian Leukosis Virus. 

25 These recombinant retroviral vectors may be used to generate transduction competent retroviral vector 
particles by introducing tiiem into appropriate packaging cell lines (see US patent 5,591,624). Retrovirus 
vectors can be constructed for site-specific integration into host cell DNA by incorporation of a chimeric 
mtegrase enzyme mto tiie retroviral particle (see W096/37626). It is preferable that tiie recombinant viral 
vector is a replication defective recombinant virus. 

30 Packa^ cell Imes suitable for use with flie above-described retrovirus vectors are well known hi the art, 
are readily prepared (see WO95/30763 and WO92/05266), and can be used to create producer cell lines 
(also termed vector cell lines or "VCLs") for the production of recombinant vector particles. Prefisrably, tiie 
packaging cell Imes are made from human parent cells (eg. HT1080 cells) or mink parent ceU Imes, which 
eliminates inactivation in human serum. 
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Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian Leukosis Virus, 
Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-CeU Focus-Inducing Virus, Murine Sarcoma 
Virus, ReticuloendotheUosis Virus and Rous Sarcoma Virus. Particularly preferred Murine Leukemia 
Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 19:19-25), Abelson (ATCC No. 

5 VR-999), Friend (ATCC No. VR-245), GrafB, Gross (ATCC Nol VR-590), Kirsten, Harvey Sarcoma Virus 
and Rauscher (ATCC No. VR-998) and Moloney Murine Leukemia Virus (ATCC No. VR-190). Such 
retroviruses may be obtained fiom depositories or collections such as the American Type Culture Collection 
("ATCC") in Rockville, Maryland or isolated fiom known sources using commonly available techniques. 
Exemplary known retroviral gene therapy vectors employable in this invention include those described m 

10 patent applications GB2200651, EP0415731, EP0345242, EP0334301. WO89/02468; WO89/05349, 
WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, W093/25234, WO93/11230, 
WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 5,219,740, US 4,405,712, US 4,861,719, US 
4,980,289, US 4.777,127, US 5,591,624. See also Vile (1993) Cancer Res 53:3860-3864; Vile (1993) 
Cancer Res 53:962-967; Ram (1993) Cancer Res 53 (1993) 83-88; Takamiya (1992) J Neurosci Res 

15 33:493-503; Baba (1993) JNewosttrg 79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad 
Sci 81:6349; and Miller (1990) Htman Gene Therapy 1. 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. See, for 
example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and WO93/07283, 
WO93/06223, and WO93/07282. Exemplary known adraioviral gene therapy vectors employable in this 
20 mvention include those described in the above referenced documents and in W094/12649, WO93/03769, 
W093/19191, W094/28938, W095/11984, WO95/00655, WO95/27071, W095/29993, W095/34671, 
WO96/05320, WO94/08026, WO94/11506, WO93/06223, W094/24299, WO95/14102, W095/24297, 
WO95/02697, W094/28152, W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and 
WO95/09654. Alternatively, administration of DNA linked to killed adenovirus as described in Curiel 
25 (1992) Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leadmg and preferred examples of such vectors for use 
in this invention are the AAV-2 based vectors disclosed in Srivastava, WO93/09239. Most preferred AAV 
vectors comprise the two AAV uiverted termmal repeats in which the native D-sequences are modified by 
substitution of nucleotides, such that at least 5 native nucleotides and up to 18 native nucleotides, preferably 
30 at least 10 native nucleotides up to 18 native nucleotides, most preferably 10 native nucleotides are retained 
and the remaining nucleotides of the D-sequence are deleted or replaced with non-native nucleotides. The 
native D-sequences of the AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in 
each AAV inverted terminal repeat (fe. there is one sequence at each end) which are not involved in HP 
formation. The non-native replacement nucleotide may be any nucleotide other than the nucleotide found in 
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the native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 
pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of such an 
AAV vector is psub201 (see Samnlski (1987) J, Virol 61:3096). Another exemplary AAV vector is the 
Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US Patent 5,478,745, Still 

5 other vectors are tiiose disclosed in Carter US Patent 4,797,368 and Muzyczka US Patent 5,139,941, 
Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a further example of an AAV vector 
employable m this invention is SSV9AFABTKneo, which contains the AFP enhancer and albumin 
promoter and directs expression predommantly in the liver. Its structure and construction are disclosed in Su 
(1996) Human Gene Therapy 7:463-470. Additional AAV gene therapy vectors are described in US 

10 5,354,678, US 5,173,414, US 5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred examples are 
herpes simplex virus vectors containing a sequence encoding a thymidine kinase polypeptide such as those 
disclosed in US 5,288,641 and EP0176170 (Roizman). Additional exemplary herpes simplex virus vectors 
include HFEM/ICP6-LacZ disclosed in WO95/04139 (Wistar Institute), pHSVlac described in Geller 

15 (1988) Science 241:1667-1669 and m WO90/09441 and WO92/07945, HSV Us3::pgC-lacZ described in 
Fink (1992) Human Gene Therapy 3:11-19 and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 
(Breakefield), and those deposited with the ATCC with accession numbers VR-977 and VR-260. 
Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. Preferred 
alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC VR-67; ATCC 

20 VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; ATCC VR-1246), 
Venezuelan equine encephaUtis virus (ATCC VR923; ATCC VR-1250; ATCC VR-1249; ATCC VR-532), 
and those described in US patents 5,091,309, 5,217,879, and WO92/10578. More particularly, those alpha 
virus vectors described in US Serial No. 08/405,627, filed March 15, 1995,W094/21792, WO92/10578, 
WO95/07994, US 5,091,309 and US 5,217,879 are employable. Such alpha viruses may be obtamed fi-om 

25 depositories or collections such as the ATCC in Rockville, Maryland or isolated from known sources using 
commonly available techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see 
USSN 08/679640). 

DNA vector systems such as eukaiyotic layered expression systems are also useful for expressing the 
nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered expression 
30 systems. Preferably, the eukaryotic layered expression q^stems of the invention are derived from alphavirus 
vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived fix>m poliovirus, for 
example ATCC VR-58 and those described m Evans, Nature 339 (1989) 385 and Sabin (1973) /. Biol. 
Standardization 1:115; rhinovirus, for example ATCC VR-1110 and those described in Arnold (1990) J 
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Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for example ATCC VR-1 1 1 and 
ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl Acad Sci 86:317; Flexner (1989) 
NYAcadSci 569:86, Flexner (1990) Vaccine 8:17; in US 4,603,112 and US 4,769,330 and WO89/01973; 
SV40 virus, for example ATCC VR-305 and those described in Mulligan (1979) Nature 277:108 and 

5 Madzak (1992) J Gen Virol 73:1533; influenza virus, for example ATCC VR-797 and recombinant 
infhienza viruses made enq)loying reverse gaieties techniques as described in US 5,166,057 and in Enami 
(1990) Proc Natl Acad Sci 87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 and Liiy^es (1989) 
Cett 59:110, (see also McMichael (1983) NEJMed 309:13, and Yap (1978) Nature 273-^38 and Nature 
(1979) 277:108); human umnunodeficiency vkus as described in EP-0386882 and in Buchschacher (1992) 

10 J. Virol. 66:2731; measles vuxis, for example ATCC VR-67 and VR-1247 and those described in EP- 
0440219; Aura virus, for example ATCC VR-368; Bebaru vuus, for example ATCC VR-600 and ATCC 
VR-1240; Cabassou vhus, for example ATCC VR-922; Chikungunya vims, for example ATCC VR-64 and 
ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; Getah v'uus, for example ATCC VR-369 
and ATCC VR-1243; Kyzylagach virus, for example ATCC VR-927; Mayaro virus, for example ATCC 

15 VR-66; Mucambo virus, for example ATCC VR-580 and ATCC VR-1244; Ndumu virus, for example 
ATCC VR-371; Pixuna virus, for example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example 
ATCC VR-925; Triniti virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; 
Whataroa virus, for example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; ©"Nyong vuus, 
Eastem encephalitis viiTus, for example ATCC VR-65 and ATCC VR-1 242 j Western encephalitis virus, for 

20 example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, for 
example ATCC VR-.740 and those described in Harare (1966) Proc Soc Exp Biol Med 121 :190. 
Delivery of the compositions of this invention into cells is not limited to the above mentioned viral vectors. 
Other delivery methods and media may be employed such as, for example, nucleic acid e^qpression vectors, 
polycationic condensed DNA linked or unlinked to killed adenovirus alone, for example see US Serial No. 

25 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 3:147-154 Ugand linked DNA, for 
example see Wu (1989) J Biol Chem 264:16985-16987, eucaryotic cell delivery vehicles cells, for example 
see us' Serial No.08/240,G30, filed May 9, 1994, and US Serial No. 08/404,796, deposition of 
photopolymerized hydrogel materials, hand-held gene transfer particle gun, as described m US Patent 
5,149,655, ionizing radiation as described in US5,206,152 and in WO92/11033, nucleic charge 

30 neutralization or fiasion with cell membranes. Additional approaches are described in Philip (1994) Mol Cell 
Biol 14:241 1-2418 and m Woflfendin (1994) Proc Natl Acad Sci 91:1581-1585. 

Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. Briefly, the 
sequence can be iiiserted into conventional vectors that contain conventional control sequences for high 
level expression, and then incubated with synthetic gene transfer molecules such as polymeric 
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DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting iigands such as 
asialooiosomucoid, as described in Wu & Wu (1987) J. Biol Chem. 262:4429-4432, insulin as described in 
Hucked (1990) Biochem Phmnacol 40:253-263, galactose as described in Plank (1992) Bioconjugate Chem 
3:533-539, lactose or transferrin. 
5 Naked DNA may also be employed. Exemplary naked DNA introduction metiiods are described in WO 
90/11092 and US 5,580,859. Uptake eflBcieaticy may be improved using biodegradable latex beads. DNA 
coated latex beads are efficientiy transported into cells after endocytosis initiation by the beads. The method 
may be improved further by treatment of Ihe beads to increase hydrophobicity and thereby fecilitate 
disruption of the endosome and release of the DNA into the cytoplasm. 
10 Liposomes that can act as gene delivery vehicles are described in US ,5,422,120, W095/13796, 
W094/23697, W091/14445 and BP-524,968. As described in USSN. 60/023,867, on non-viral delivery, the 
nucleic acid sequences encoding a polypeptide can be mserted into conventional vectors tiiat contain 
conventional control sequences for high level expression, and then be mcubated with synthetic gene transfer 
molecules such as polymeric DNA-binding cations like polylysme, protamine, and albumin, linked to cell 
15 targeting Iigands such as asialoorosomucoid, insulin, galactose, lactose, or transferrin. Other deUvery 
systems include fee use of liposomes to aicapsulate DNA comprising the gene under the control of a 
variety of tissue-specific or ubiquitously-active promoters. Further non-viial delivery suitable for use 
includes mechanical delivery systems such as the approach described m Woffendin et al (1994) Proc. Natl 
Acad. Sci. USA 91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such 
20 can be delivered through deposition of photopolymerized hydrogel materials. Otiier conventional methods 
for gene deUvery that can be used for delivery of tiie coding sequence inclxide, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ioniang radiation for activating 
transferred gene, as described in US 5,206,152 and WO92/11033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 and 
25 4,762,915; m WO 95/13796; W094/23697; and W091/14445; m EP-0524968; and in Stryer, Biochemistry, 
pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem Biophys Acta 600:1; Bayer 
(1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 149:119; Wang (1987) Proc Natl 
Acad Sci 84:7851; Plant (m9)Anal Biochem 176:420. 

A polynucleotide composition can comprises tiierapeutically effective amount of a gene therapy vehicle, as 
30 tiie term is defined above. For purposes of the present mvention, an effective dose will be from about 0.01 
mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of tiie DNA constructs m tiie mdividual to which it is 
administered. 
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Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly to the 
subject; (2) delivered ex vivOy to cells derived from the subject; or (3) in vitro for expression of recombinant 
proteins. The subjects to be treated can be mammals or birds. Also, human subjects can be treated. 

5 Direct delivery of the compositions will generally be accomplished by mjection, either subcutaneously, 
intraperitoneally, intravenously or intramuscularly or delivered to the interstitial space of a tissue. The 
compositions can also be administered into a lesiorL Other modes of administration include oral and 
puhnonary administration, suppositories, and transdermal or transcutaneous applications {eg. see 
WO98/20734), needles, and gene guns or hyposprays. Dosage treatment may be a single dose schedule or a 

10 multiple dose schedule. 

Methods for the ex vivo delivery and reunplantation of transformed cells into a subject are known in the art 
and described in eg. W093/14778. Examples of cells useful in ex vivo applications include, for example, 
stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or tumor cells. 
Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished by the 

15 following procedures, for example, dextran-mediated transfection, calcium phosphate precipitation, 
polybrene mediated transfection, protoplast fiision, electroporation, encapsulation of the polynucleotide(s) 
in liposomes, and direct microinjection of the DNA into nuclei, all well known m the art. 
Polynucleotide and volvpeptide pharmaceutical compositions 
The terms "polynucleotide" and "nucleic acid", used mterchangeably herem, 

20 In addition to the pharmaceutically acceptable carriers and salts described above, the following additional 
agents can be used with polynucleotide and/or polypeptide compositions. 
A.Polvpeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); transferrin; 
asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, granulocyte, 
25 macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating factor (G-CSF), 
macrophage colony stimulating factor (M-CSF), stem cell factor and erythropoietin. Viral antigens, such as 
envelope protems, can also be used. Also, proteins from other mvasive organisms, such as the 17 amino 
acid peptide from the circumsporozoite protein of Plasmodium falciparum known as RII. 
B Jlormones, Vitamins, etc. 

30 Other groups tiiat can be included are, for example: hormones, steroids, androgens, estrogens, thyroid 
hormone, or vitamins, folic acid. 
C Jolvalkvlenes. Polvsaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a preferred 
embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or polysaccharides 
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can be included. In a preferred embodiment of this aspect, the polysaccharide is dextran or DEAE-dextraa 
Also, chitosan and poly(iactide-co-glycolide) 
D.Lipids. and Liposomes 

The desired polynucleotide^olypeptide can also be encapsulated in lipids or packaged in liposomes prior to 
delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or entrap and 
retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary but will generally be 
aroxmd 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as carriers 
for delivery of nucleic acids, see, Hug and Sleight {1991) Biochim. Biophys, Acta. 1097:1-17; Straubinger 
(1983) Metk Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), anionic 
(negatively charged) and neutral preparations. Cationic liposomes have been shown to mediate intracellular 
deUvery of plasmid DNA (Feigner (1987) Proc, Natl Acad. Set USA 84:7413-7416); mRNA (Malone 
(1989) Proc. Natl. Acad. ScL USA 86:6077-6081); and purified transcription factors (Debs (1990) /. BioL 
Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, 

N[l-2,3-dioleyloxy)propyl]-Nj^,N-triethylammonium (DOTMA) liposomes are available under the 
trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also. Feigner siq>ra). Other 
commercially avaUable Hposomes include transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). 
Other cationic liposomes can be prepared from readily available materials using techniques well known in 
the art. See, eg. Szoka (1978) Proc. Natl. Acad. Set USA 75:4194-4198; WO90/11092 for a description of 
the synthesis of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 
Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials include 
phosphatidyl choUne, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline (DOPC), 
dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), among others. These 
materials can also be mixed with the DOTMA and DOTAP starting materials in appropriate ratios. Methods 
for making liposomes using these materials are well known in the art. 

The liposomes can comprise multilanunelar vesicles (MLVs), small unilamellar vesicles (SUVs), or large 
unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared using methods 
known m the art See eg. Straubinger (1983) Meth. Immunol. 101:512-527; Szoka (1978) Proc. Natl. Acad. 
Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 394:483; Wilson (1979) Cell 
17:77); Deamer & Bangham (1976) Biochtm. Biophys. Acta 443:629; Ostro (1977) Biochem. Biophys. Res. 
Commm. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 76:3348); Enoch & Strittmatter (1979) Proc. 
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Natl Acad. Set USA 76:145; Fraley (1980) J. Biol Chem, (1980) 255:10431; Szoka & Papahadjopoulos 
(1978) Proc. Natl Acad. Set USA 75:145; and Schaefer-Ridder (1982) Science 215:166. 
EXipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. Examples of 
lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. Mutants, fragments, or 
fusions of these proteins can also be used Also, modifications of naturally occurring lipoproteins can be 
used, such as acetylated LDL. These lipoprotems can target the delivery of polynucleotides to cells 
e3q)ressing lipoprotein receptors. Preferably, if lipoproteins are including with the polynucleotide to be 
delivered, no other targeting ligand is included in the composition. 

Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are known as 
apoproteins. At the present, apoproteins A, C, D, and E have been isolated and identified. At least two of 
these contam several proteins, designated by Roman nximerals, AI, AH, ATV ; CI, CII, CHI. 
A Upoprotein can comprise more than one apoprotein. For example, naturally occurring* chylomicrons 
comprises of A, B, C & E, over time these lipoproteins lose A and acquire C & E. VLDL comprises A, B, C 
& E apoprotems, LDL comprises apoprotein B; and HDL comprises apoproteins A, C, & E. 
The amino acid of these apoprotems are known and are described in, for example, Breslow (1985) Annu 
Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 261:12918; 
Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 65:232. 
Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
phospholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of naturally 
occurring lipoproteins can be found, for example, in Metk Enzymol 128 (1986). The composition of the 
lipids are chosen to aid in conformation of the apoprotein for receptor binding activity. The composition of 
lipids can also be chosen to facilitate hydrophobic interaction and association with the polynucleotide 
binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. Such 
methods are described in Metk Enzymol (supra); Pitas (1980) J, Biochem, 255:5454-5460 and Mahey 
(1979) J C/zn. Invest 64:743-750. Lipoproteins can also be produced by in vitro or recombinant methods by 
expression of the apoprotein genes in a desired host cell. See, for example, Atkinson (1986) Annu Rev 
Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 443. Lipoproteins can also be 
purchased from conunercial suppliers, such as Biomedical Techniologies, Inc., Stoughton, Massachusetts, 
USA. Further description of lipoproteins can be found in Zuckermann et al PCTAJS97/14465, 
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F.Polvcatiomc Agents 

Polycationic agents can be included, with or without Upoprotein, m a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are capable of 
5 neutralizing the electrical charge of nucleic acids to fecilitate delivery to a desired location. These agents 
have botii in vitro, ex vivo, and in vivo appUcations. Polycationic agents can be used to deliver nucleic acids 
to a living subject either intramuscularly, subcutaneously, etc. 

The foUowmg are examples of useful polypeptides as polycationic agents; polylysine, polyarginine, 
polyomithine, and protamine. Otiier examples include histones, protamines, human serum albumin, DNA 
10 binding proteins, non-histone chromosomal proteins, coat protems from DNA viruses, such as (X174, 
transcriptional factors also contain domams tiiat bind DNA and therefore may be useful as nucleic aid 
condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, AP-1, AP-2, AP-3, CPF, 
Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that bmd DNA sequences. 
Organic polycationic agents include: spermine, spermidme, and purtrescine. 
15 The dimensions and of tiie physical properties of a polycationic agent can be extrapolated from tiie list 
above, to construct otiier polypeptide polycationic agents or to produce syntiietic polycationic agents. 
Syntiietic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers tiiat form polycationic complexes when combmed witii 
polynucleotides/jwlypeptides. 
20 Immmodiaenostic Assays 

Sti-eptococcus antigens of tiie invention can be used m immunoassays to detect antibody levels (or, 
conversely, anti-Streptococcus antibodies can be used to detect antigen levels). Immunoassays based on 
well defined, recombinant antigens can be developed to replace invasive diagnostics metiiods. Antibodies to 
Sti-eptococcus proteins witiiin biological samples, includmg for example, blood or serum samples, can be 
25 detected. Design of tiie immunoassays is subject to a great deal of variation, and a variety of tiiese are 
known m tiie art Protocols for tiie immunoassay may be based, for example, upon competition, or direct 
reaction, or sandwich type assays. Protocols may also, for example, use solid supports, or may be by 
unmunoprecipitatioa Most assays involve tiie use of labeled antibody or polypeptide; tiie labels may be, for 
example, fluorescent, chemiluminescent, radioactive, or dye molecules. Assays which amplify tiie signals 
30 from tiie probe are also known; examples of which are assays which utiUze biotin and avidin, and enzyme- 
labeled and mediated immunoassays, such as ELISA assays. 

Kits suitable for immunodiagnosis and containing tiie appropriate labeled reagents are constiiicted by 
packaging tiie appropriate materials, including tiie compositions of tiie invention, in suitable containers, 
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along with the remaining reagents and materials (for example, suitable buffers, salt solutions, etc.) required 
for Ihe conduct of tiie assay, as well as suitable set of assay instructions. 
The ofPolvDeptides to Screen for Peptide An alogs and Antagonists 

Polypeptides encoded by fte instant polynucleotides and corresponding full lengflx genes can be used to 
5 screen peptide Ubraries to identify binding partners, such as receptors, from within the library. Peptide 
libraries can be synthesized accorduig to methods known in the art (e.g. Us patent 5,010,175; 
W091/17823). Agonists or antagonists of the polypeptides if tibe invention can be screened using any 
available method known in the art, such as signal transduction, antibody bmdirig, receptor binding, 
raitogenic assays, chemotaxis assays, etc. The assay conditions ideally should resemble the conditions under 
10 which the native activity is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. 
Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the native activity at 
concentrations that do not cause toxic side effects in the subject. Agonists or antagonists that compete for 
binding to tiie native polypeptide can require concentrations equal to or greater than the native 
concentration, while inhibitors capable of bmding irreversibly to the polypeptide can be added in 
1 5 concentrations on the order of the native concentration. 

Sudi screenmg and experimentation can lead to identification of a polypeptide buiding partner, such as a 
receptor, encoded by a gene ot a cDNA correspondmg to a polynucleotide described herein, and at least one 
peptide agonist or antagonist of the binding partner. Such agonists and antagonists can be used to modulate, 
enhance, or inhibit receptor function in cells to which the receptor is native, or in cells that possess tiie 
20 receptor as a result of genetic engineering. Further, if flie receptor shares biologically important 
characteristics with a known receptor, information about agonist/antagonist binding can fecilitate 
development of improved agonists/antagonists of tiie known receptor. 
Identification of anti-bacterial asents 
Drug Screening Assays 

25 Of particular interest in the present invention is tiie identification of agents that have activity in modulating 
expression of one or more of tiie adhesion-specific genes described herein, so as to inhibit infection and/or 
disease. Of particular interest are screening assays for agents tiiat have a low toxicity for human cells. 
The term "agent" as used hereui describes any molecule with tiie capability of altering or mimicking tiie 
expression or physiological fimction of a gene product of a differentially expressed gene. Generally a 

30 plurality of assay mixtures are run in parallel witii different agent concentrations to obtain a differential 
response to tiie various concenti»tions. Typically, one of fliese concentrations serves as a negative control 
i.e. at zero concentration or below the level of detection. 

Candidate agents encompass numerous chemical classes, including, but not lunited to, organic molecules 
(eg. small organic compounds having a molecular wei^t of more tiian 50 and less tiian about 2,500 
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daltons), peptides, antisense polynucleotides, and ribozymes, and the Uke. Candidate agents can comprise 
functional groups necessary for structural interaction with proteins, particularly hydrogen bonding, and 
typically include at least an amine, carbonyl, hydroxyl or carboxyl groiq), preferably at least two of the 
fiinctional chemical groi^. The candidate agents often comprise cyclical carbon or heterocycUc structures 
5 and/or aromatic or polyaromatic structures substituted witii one or more of the above functional groiq)s. 
Candidate agents are also found among biomolecules including, but not Ifanited to: polynucleotides, 
peptides, saccharides, fatty acids, steroids, purines, pyrimidines, derivatives, structural analogs or 
combinations thereof. 

Candidate agents are obtained from a wide variety of sources mcluding Ubraries of synthetic or natural 
10 compounds. For example, numerous means are available for random and directed synthesis of a wide 
variety of organic compounds and biomolecules, including expression of randomized oligonucleotides and 
oUgopeptides. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and 
animal extracts are available or readily produced. Additionally, natural or synthetically produced libraries 
and compounds are readUy modified tiirough conventional chemical, physical and biochemical means, and 
15 may be used to produce combinatorial Ubraries. Known pharmacological agents may be subjected to 
directed or random chemical modifications, such as acylation, alkylation, esterification, amidification, etc. 
to produce structural analogs. 
Screenin g of Candidate Agents In Vitro 

A wide variety of in vitro assays may be used to screen candidate agents for llie desired biological activity, 
20 including, but not limited to, labeled in vitro protein-protem binding assays, protein-DNA binding assays 
(e.g. to identify agents that affect expression), electrophoretic mobUily shift assays, immunoassays for 
protein binding, and tiie like. For example, by providing for tiie production of large amounts of a 
differentially expressed polypeptide, one can identify ligands or substrates that bind to, modulate or mimic 
the action of tiie polypeptide. The purified polypeptide may also be used for determmation of tiiree- 
25 dimensional crystal stiucture, which can be used for modeling intermolecular interactions, transcriptional 
regulation, etc. 

The screening assay can be a binding assay, wherem one or more of tiie molecules may be joined to a label, 
and flie label directiy or indirectiy provide a detectable signal. Various labels include radioisotopes, 
fluorescers, chemilummescers, enzymes, specific bmding molecules, particles, e.g. magnetic particles, and 
30 tiie like. Specific binding molecules include pairs, such as biotin and sti^ptavidin, digoxin and antidigoxin 
etc. For tiie specific binding members, tiie complementary member would normally be labeled witii a 
molecule tiiat provides for detection, in accordance wifli known procedures. 

A variety of otiier reagents may be included m tiie screening assays described herein. Where flie assay is a 
binding assay, tiiese include reagents like salts, neutral protems, e.g. albumin, detergents, etc. tiiat are used 
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to facilitate optimal protein-protein binding, protein-DNA binding, and/or reduce non-specific or 
background interactions. Reagents that improve the efficiency of the assay, such as protease inhibitors, 
nuclease inhibitors, anti-microbial agents, etc. may be used. The mixture of components are added in any 
order that provides for the requisite binding. Incubations are performed at any suitable temperature, 
typically between 4 and 40^C. Incubation periods are selected for optimum activity, but may also be 
optimized to facilitate rapid high-throughput screening. Typically between 0.1 and 1 hours will be 
sufGcient. 

Many mammalian genes have homologs in yeast and lower animals. The study of such homologs' 
physiological role and interactions with other proteins in vivo or in vitro can facilitate understanding of 
biological function. In addition to model systems based on genetic complementation, yeast has been shown 
to be a powerful tool for studying protein-protein interactions through the two hybrid system. 
Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen bonding. 
Typically, one sequence will be fixed to a solid support and the other will be free in solution. Then, the two 
sequences will be placed in contact with one another under conditions that favor hydrogen bonding. Factors 
that affect this bonding include: the type and volume of solvent; reaction temperature; time of hybridization; 
agitation; agents to block the non-specific attachment of the liquid phase sequence to the solid support 
Penhardt's reagent or BLOTTO); concentration of the sequences; use of compounds to increase the rate of 
association of sequences (dextran sulfate or polyethylene glycol); and the stringency of the washing 
conditions following hybridization. See Sambrook et al \suprd\ Volume 2, chapter 9, pages 9.47 to 9.57. 
"Stringency" refers to conditions in a hybridization reaction that favor association of very shnilar sequences 
over sequences that differ. For example, the combination of temperature and salt concentration should be 
chosen that is approximately 120 to 200°C below the calculated Tm of the hybrid under study. The 
temperature and salt conditions can often be determined empirically in preliminary experiments in which 
samples of genomic DNA immobilized on filters are hybridized to the sequence of interest and then washed 
under conditions of different stringencies. See Sambrook et al at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of the DNA 
being blotted and (2) the homology between the probe and the sequences being detected. The total amount 
of the fragment(s) to be studied can vary a magnitude of 10, fix)m 0.1 to l|Lig for a plasmid or phage digest 
to 10'' to 10"* g for a single copy gene in a highly complex eukaryotic genome. For lower complexity 
polynucleotides, substantially shorter blotting, hybridization, and e3q)0sure times, a smaller amount of 
starting polynucleotides, and lower specific activity of probes can be used. For example, a single-copy yeast 
gene can be detected with an exposure time of only 1 hour starting with 1 jig of yeast DNA, blotting for two 
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hours, and hybridizing for 4-8 hours with a probe of 10 cpm/jig. For a single-copy mammalian gene a 
conservative approach would start with 10 ^g of DNA, blot overnight, and hybridize overnight in the 
presence of 10% dextran sulfate using a probe of greater than 10^ cpm/^g, resulting in an exposure time of 
'-24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe and the 
fragment of interest, and consequently, the appropriate conditions for hybridization and washing. In many 
cases the probe is not 100% homologous to the fragment Other commonly encountered variables include 
the length and total G+C content of the hybridizing sequences and the ionic strength and formamide content 
of the hybridization buffer. The effects of all of these factors can be approximated by a single eqxiation: 

Tm= 81 + 16.6(logioCi) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 
where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs (slightly 
modified from Meinkoth & Wahl (1984) ^na/. Biochem. 138: 267-284). 

In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
convenientiy altered. The temperature of the hybridization and washes and the salt concentration during the 
washes are the simplest to adjust. As the temperature of the hybridization mcreases (le. stringency), it 
becomes less likely for hybridization to occur between strands that are nonhomologous, and as a result, 
background decreases. If the radiolabeled probe is not completely homologous with the inmiobilized 
fragment (as is frequentiy tiie case in gene family and interspecies hybridization experiments), the 
hybridization temperature must be reduced, and background will increase. The temperature of the washes 
affects the intensity of the hybridizing band and the degree of background in a similar manner. The 
stringency of the washes is also increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42*'C for a probe 
witii is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, and 32^C for 
85% to 90% homology. For lower homologies, formamide content should be lowered and temperature 
adjusted accordingly, using the equation above. If the homology between the probe and the target fragment 
are not known, the simplest approach is to start with both hybridization and wash conditions which are 
nonstringent. If non-specific bands or high background are observed after autoradiography, the filter can be 
washed at hig^ stringency and reexposed. If the time required for exposure makes this approach impractical, 
several hybridization and/or washing stringencies should be tested in parallel. 
Nucleic Acid Probe Assays 

Methods such as PGR, branched DNA probe assays, or blotting techniques utilizmg nucleic acid probes 
according to the invention can determine the presence of cDNA or mKNA. A probe is said to ''hybridize'' 
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with a sequence of the invention if it can form a duplex or double stranded complex, which is stable enough 
to be detected. 

The nucleic acid probes will hybridize to the Streptococcus nucleotide sequences of the invention (including 
both sense and antisense strands). Though many different nucleotide sequences will encode the amino acid 
sequence, the native Streptococcal sequence is preferred because it is the actual sequence present in cells. 
mRNA represents a coding sequence and so a probe should be complementary to the coding sequence; 
single-stranded cDNA is complementary to mRNA, and so a cDNA probe should be complementary to the 
non-coding sequence. 

The probe sequence need not be identical to the Streptococcal sequence (or its complement) — some 
variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe can 
form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can include 
additional nucleotides to stabilize the formed duplex. Additional Streptococcus sequence may also be 
helpful as a label to detect the formed duplex. For example, a non-complementary nucleotide sequence may 
be attached to the 5* end of the probe, with the remainder of the probe sequence being complementary to a 
Streptococcus sequence. Alternatively, non-complementary bases or longer sequences can be interspersed 
into the probe, provided that the probe sequence has sufficient complementarity with the a Streptococcus 
sequence in order to hybridize therewith and thereby form a duplex which can be detected. 
The exact length and sequence of the probe will depend on the hybridization conditions (e.g. temperature, 
salt condition etc.). For example, for diagnostic applications, depending on the complexity of the analyte 
sequence, the nucleic acid probe typically contains at least 10-20 nucleotides, preferably 15-25, and more 
preferably at least 30 nucleotides, although it may be shorter than this. Short primers generally require 
cooler temperatures to form sufficiently stable hybrid complexes with the template. 
Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al [/. Am. 
Chem. Soc, (1981) 103:3185], or according to Urdea et al [Proc. Natl Acad. ScL USA (1983) 80: 7461], or 
using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, DNA or 
RNA are appropriate. For other applications, modifications may be incorporated eg. backbone 
modifications, such as phosphorothioates or methylphosphonates, can be xised to increase in vivo half-life, 
alter ENA aflSnity, increase nuclease resistance etc. [eg. see Agrawal & Iyer (1995) Curr Opin Biotechnol 
6:12-19; Agrawal (1996) TBTECH 14:376-387]; analogues such as peptide nucleic acids may also be used 
[eg. see Corey (1997) 7I5r£ar 15:224-229; Buchardte/ a/. (1993) 11:384-386]. 
Alternatively, the polymerase chain reaction (PGR) is another well-known means for detecting small 
amounts of target nucleic acid. The assay is described in MuUis etal [Meth. Enzymol (1987) 155:335-350] 
& US patents 4,683,195 & 4,683,202. Two ''primer'* nucleotides hybridize with the target nucleic acids and 
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are used to prime the reaction. The primers can comprise sequence that does not hybridize to the sequence 
of the amplification target (or its complement) to aid with duplex stability or, for example, to incorporate a 
convenient restriction site. Typically, such sequence will flank the desired Streptococcus sequence. 
A thermostable polymerase creates copies of target nucleic acids &om the primes using the original target 
nucleic acids as a template. After a threshold amount of target nucleic acids are generated by the 
polymerase, Ihey can be detected by more traditional methods, such as Southern blots. When using the 
Southern blot method, the labelled probe will hybridize to the Streptococcus sequence (or its complement). 

Also, mKNA or cDNA can be detected by traditional blotting techniques described in Sambrook et 
al \suprd\. mRNA, or cDNA generated from mKNA using a polymerase en^me, can be purified and 
separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid support, such 
as nitrocellulose. The solid support is exposed to a labelled probe and then washed to remove any 
unhybridized probe. Next, the duplexes containing the labeled probe are detected. Typically, the probe is 
labelled with a radioactive moiety. 
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